Creel surveys allow fisheries scientists and managers to collect data on catch and harvest, an angler population (including effort expended), and, depending on survey design, biological data on fish populations. Though important methods of collecting data on the user base of the fishery, creel surveys are difficult to implement and, in graduate fisheries programs, creel surveys are paid little attention. As a result, fisheries managers–the first job for many fisheries-program graduates–often inherit old surveys or are told to institute new surveys with little knowledge of how to do so.
Fisheries can cover large spatial extents: large reservoirs, coast-lines, and river systems. A creel survey has to be statistically valid, adaptable to the geographic challenges of the fishery, and cost efficient. Limited budgets can prevent agencies from implementing creel surveys; the AnglerCreelSurveySimulation was designed to help managers explore the type of creel survey that is most appropriate for their fishery, including fisheries with multiple access points, access points that are more popular than others, variation in catch rate, the number of surveyors, and seasonal variation in day-lengths.
The AnglerCreelSurveySimulation
package does require
that users know something about their fishery and the human dimensions
of that fishery. A prior knowledge includes mean trip length
for a party (or individual), the mean catch rate of the
The AnglerCreelSurveySimulation
package is simple, but
powerful. Four functions provide the means for users to create a
population of anglers, limit the length of the fishing day to any value,
and provide a mean trip length for the population. Ultimately, the user
only needs to know the final function
ConductMultipleSurveys
but because I’d rather this
not be a black box of functions, this brief
introduction will be a step-by-step process through the package.
This tutorial assumes that we have a very simple, small fishery with only one access point that, on any given day, is visited by 100 anglers. The fishing day length for our theoretical fishery is 12 hours (say, from 6 am to 6pm) and all anglers are required to have completed their trip by 6pm. Lastly, the mean trip length is known to be 3.5 hours.
For the purposes of this package, all times are functions of the
fishing day. In other words, if a fishing day length is 12 hours (e.g.,
from 6 am to 6pm) and an angler starts their trip at 2
and
ends at 4
that means that they started their trip at 8 am
and ended at 10 am.
The make_anglers()
function builds a population of
anglers:
library(AnglerCreelSurveySimulation)
anglers <- make_anglers(n_anglers = 100, mean_trip_length = 3.5, fishing_day_length = 12)
make_anglers()
returns a dataframe with
start_time
, trip_length
, and
departure_time
for all anglers.
head(anglers)
#> start_time trip_length departure_time
#> 1 2.344548 0.6990596 3.043607
#> 2 8.879916 1.7421654 10.622081
#> 3 2.132229 1.5513144 3.683544
#> 4 2.455847 5.0693691 7.525216
#> 5 5.909529 1.9754853 7.885014
#> 6 10.956395 0.5725891 11.528984
In the head(anglers)
statement, you can see that
starttime
, triplength
, and
departureTime
are all available for each angler. The first
angler started their trip roughly 2.34 hours into the fishing day,
continued to fish for 0.7 hours, and left the access point at 3.04 hours
into the fishing day. Angler start times are assigned by the
uniform
distribution and trip lengths are assigned by the
gamma
distribution. To get true effort of all the anglers
for this angler population, summing trip_length
is all
that’s needed: 0.
The distribution of angler trip lengths can be easily visualized:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
# Histogram overlaid with kernel density curve
anglers %>%
ggplot(aes(x=trip_length)) +
geom_histogram(aes(y=..density..),
binwidth=.1,
colour="black", fill="white") +
geom_density(alpha=.2, fill="#FF6666")
Once the population of anglers has been created, the next function to
apply is the get_total_values()
function. In
get_total_values()
, the user specifies the start time of
the creel surveyor, the end time of the surveyor, and the wait time of
the surveyor. Here is where the user also specifies the sampling
probability of the anglers (in most cases, equal to $\frac{waitTime}{fishingDayLength}$) and the
mean catch rate of the fishery. There are a number of a default settings
in the get_total_values()
function; see
?get_total_values
for a description of how the function
handles NULL
values for startTime
,
endTime
, and waitTime
. startTime
and waitTime
are the times that the surveyor started and
waited at the access point. totalCatch
and
trueEffort
are the total (or real) values for
catch and effort. meanLambda
is the mean catch rate for all
anglers. Even though we assigned meanCatchRate
to
get_total_values()
, individual mean catch rates are
simulated by rgamma()
with shape equal to
meanCatchRate
and rate equal to 1
.
For this walk through, we’ll schedule the surveyor to work for a total of eight hours at the sole access point in our fishery:
anglers %>%
get_total_values(start_time = 0, wait_time = 8, circuit_time = 8, mean_catch_rate = 2.5,
fishing_day_length = 12)
#> n_observed_trips total_observed_trip_effort n_completed_trips
#> 1 83 205.6881 51
#> total_completed_trip_effort total_completed_trip_catch start_time wait_time
#> 1 125.9844 402.5573 0 8
#> total_catch true_effort mean_lambda
#> 1 878.151 298.652 2.868091
get_total_values()
returns a single row data frame with
several columns. The output of get_total_values()
is the
catch and effort data observed by the surveyor during their wait at the
access point along with the “true” values for catch and effort.
(Obviously, we can’t simulate biological data but, if an agency’s
protocol directed the surveyor to collect biological data, that could be
analyzed with other R
functions.)
In the output from get_total_values()
,
n_observed_trips
is the number of trips that the surveyor
observed, including anglers that arrived after she started her day and
anglers that were there for the duration of her time at the access
point. total_observed_trip_effort
is the effort expended by
those parties; because the observed trips were not complete, she did not
count their catch. n_completed_trips
is the number of
anglers that completed their trips while she was onsite,
total_completed_trip_effort
is the effort expended by those
anglers, and total_completed_trip_catch
is the number of
fish caught by those parties. Catch is both the number of fish harvested
and those caught and released.
Effort and catch are estimated from the Bus Route Estimator:
$$ \widehat{E} = T\sum\limits_{i=1}^n{\frac{1}{w_{i}}}\sum\limits_{j=1}^m{\frac{e_{ij}}{\pi_{j}}} $$
where
and
Catch rate is calculated from the Ratio of Means equation:
$$ \widehat{R_1} = \frac{\sum\limits_{i=1}^n{c_i/n}}{\sum\limits_{i=1}^n{L_i/n}} $$
where
and
* Li is the length of the fishing trip at the tie of
the interview.
For incomplete surveys, Li represents an incomplete trip.
simulate_bus_route()
calculates effort and catch based
upon these equations. See ?simulate_bus_route
for
references that include a more detailed discussion of these
equations.
simulate_bus_route()
calls make_anglers()
and get_total_values()
so many of the same arguments we
passed in the previous functions will need to be passed to
simulate_bus_route()
. The new argument,
nsites
, is the number of sites visited by the surveyor. In
more advanced simulations (see the examples in
?simulate_bus_route
), you can pass strings of values for
startTime
, waitTime
, nsites
, and
nanglers
to simulate a bus route-type survey rather than
just a single access-point survey.
sim <- simulate_bus_route(start_time = 0, wait_time = 8, n_sites = 1, n_anglers = 100,
mean_catch_rate = 2.5, fishing_day_length = 12)
sim
#> Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 251.5783 2.717137 821.3414 336.6918 2.58303
The output from simulate_bus_route()
is a dataframe with
values for Ehat
, catchRateROM
(the ratio of
means catch rate), trueCatch
, trueEffort
, and
meanLambda
. Ehat
is the estimated total effort
from the Bus Route Estimator above and catchRateROM
is
catch rate estimated from the Ratio of Means equation.
trueCatch
, trueEffort
, and
meanLambda
are the same as before. Multiplying
Ehat
by catchRateROM
gives an estimate of
total catch: 683.5727596.
With information about the fishery, the start and wait times of the
surveyor, the sampling probability, mean catch rate, and fishing day
length, we can run multiple simulations with
conduct_multiple_surveys()
.
conduct_multiple_surveys()
is a wrapper that calls the
other three functions in turn and compiles the values into a data frame
for easy plotting or analysis. The only additional argument needed is
the nsims
value which tells the function how many
simulations to conduct. For the sake of this simple simulation, let’s
assume that the creel survey works five days a week for four weeks
(i.e. 20 days):
sim <- conduct_multiple_surveys(n_sims = 20, start_time = 0, wait_time = 8, n_sites = 1,
n_anglers = 100,
mean_catch_rate = 2.5, fishing_day_length = 12)
sim
#> Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 251.2751 2.715368 891.6315 347.5472 2.502541
#> 2 262.9803 2.697911 950.5193 347.9895 2.744980
#> 3 239.5772 2.140410 701.7522 316.6104 2.261379
#> 4 211.5798 2.563842 757.3305 316.6688 2.278562
#> 5 265.1032 2.089420 765.4021 328.3983 2.180437
#> 6 219.7396 2.669585 851.2098 337.4097 2.530436
#> 7 259.8684 2.135428 880.3317 342.4824 2.544421
#> 8 269.0109 2.733957 881.6512 356.4888 2.527161
#> 9 254.4988 3.200268 1005.9532 346.1834 2.870056
#> 10 265.6539 2.348372 951.6683 343.8136 2.675011
#> 11 236.6423 2.916161 886.9852 338.9313 2.592258
#> 12 253.0033 2.227817 790.3172 346.9199 2.352073
#> 13 247.4905 2.127633 728.1211 330.5219 2.307401
#> 14 273.2019 2.796893 973.5158 355.1038 2.695640
#> 15 244.8440 2.498293 839.0884 313.3245 2.479512
#> 16 289.4480 2.416392 855.9033 373.7626 2.371455
#> 17 250.0151 2.578717 905.4115 352.0875 2.579268
#> 18 234.8942 2.036539 685.8589 306.7750 2.320789
#> 19 238.3071 2.665224 826.9444 333.3004 2.455166
#> 20 256.2799 2.479191 809.7686 331.2354 2.324683
With the output from multiple simulations, an analyst can evaluate
how closely the creel survey they’ve designed mirrors reality. A
lm()
of estimated catch as a function of
trueCatch
can tell us if the survey will over or under
estimate reality:
mod <-
sim %>%
lm((Ehat * catch_rate_ROM) ~ true_catch, data = .)
summary(mod)
#>
#> Call:
#> lm(formula = (Ehat * catch_rate_ROM) ~ true_catch, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -102.169 -11.936 -1.445 26.685 77.225
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -103.5678 101.9236 -1.016 0.323
#> true_catch 0.8641 0.1197 7.219 1.03e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 46.87 on 18 degrees of freedom
#> Multiple R-squared: 0.7433, Adjusted R-squared: 0.729
#> F-statistic: 52.11 on 1 and 18 DF, p-value: 1.026e-06
Plotting the data and the model provide a good visual means of evaluating how close our estimates are to reality:
#Create a new vector of the estimated effort multiplied by estimated catch rate
sim <-
sim %>%
mutate(est_catch = Ehat * catch_rate_ROM)
sim %>%
ggplot(aes(x = true_catch, y = est_catch)) +
geom_point() +
geom_abline(intercept = mod$coefficients[1], slope = mod$coefficients[2],
colour = "red", size = 1.01)
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
The closer the slope parameter estimate is to 1 and the intercept parameter estimate is to 0, the closer our estimate of catch is to reality.
We can create a model and plot of our effort estimates, too:
mod <-
sim %>%
lm(Ehat ~ true_effort, data = .)
summary(mod)
#>
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -30.761 -7.544 4.046 9.332 21.560
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -10.0047 62.5832 -0.160 0.874770
#> true_effort 0.7721 0.1848 4.178 0.000565 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 13.35 on 18 degrees of freedom
#> Multiple R-squared: 0.4923, Adjusted R-squared: 0.4641
#> F-statistic: 17.46 on 1 and 18 DF, p-value: 0.0005652
#Create a new vector of the estimated effort multiplied by estimated catch rate
sim %>%
ggplot(aes(x = true_effort, y = Ehat)) +
geom_point() +
geom_abline(intercept = mod$coefficients[1], slope = mod$coefficients[2],
colour = "red", size = 1.01)
If the start and wait time equals 0 and the length of the fishing day, respectively, the creel surveyor can observe all completed trips, though she’d likely be unhappy having to work 12 hours. The inputs have to be adjusted to allow her to arrive at time 0, stay for all 12 hours, and have a probability of 1.0 at catching everyone:
start_time <- 0
wait_time <- 12
sampling_prob <- 1
sim <- conduct_multiple_surveys(n_sims = 20, start_time = start_time, wait_time = wait_time,
n_sites = 1, n_anglers = 100,
mean_catch_rate = 2.5, fishing_day_length = wait_time)
sim
#> Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 355.8481 2.507862 892.4179 355.8481 2.428310
#> 2 340.2682 2.290488 779.3804 340.2682 2.285644
#> 3 371.5307 2.684120 997.2332 371.5307 2.650148
#> 4 304.8843 2.395969 730.4934 304.8843 2.430427
#> 5 371.0462 2.506491 930.0238 371.0462 2.482865
#> 6 306.8586 2.678087 821.7941 306.8586 2.491780
#> 7 343.0437 2.533606 869.1375 343.0437 2.432189
#> 8 353.2809 2.505150 885.0216 353.2809 2.568789
#> 9 349.0948 2.429015 847.9566 349.0948 2.309156
#> 10 357.8227 2.502592 895.4841 357.8227 2.528299
#> 11 322.5526 2.532571 816.8874 322.5526 2.636067
#> 12 356.7094 2.546643 908.4116 356.7094 2.625374
#> 13 349.8136 2.417021 845.5071 349.8136 2.467017
#> 14 330.4801 2.586246 854.7028 330.4801 2.583102
#> 15 339.6830 2.560220 869.6630 339.6830 2.504666
#> 16 350.1321 2.598496 909.8169 350.1321 2.619427
#> 17 338.7259 2.758521 934.3824 338.7259 2.738683
#> 18 350.9463 2.204763 773.7534 350.9463 2.211754
#> 19 313.2939 2.040435 639.2558 313.2939 2.109854
#> 20 340.5213 2.639586 898.8352 340.5213 2.564667
#> Warning in summary.lm(mod): essentially perfect fit: summary may be unreliable
#>
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.696e-14 -4.680e-16 2.437e-15 4.626e-15 1.550e-14
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.017e-13 5.200e-14 1.955e+00 0.0663 .
#> true_effort 1.000e+00 1.517e-16 6.592e+15 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.244e-14 on 18 degrees of freedom
#> Multiple R-squared: 1, Adjusted R-squared: 1
#> F-statistic: 4.346e+31 on 1 and 18 DF, p-value: < 2.2e-16
If our hypothetical fishery suddenly gained another access point and the original 100 anglers were split between the two access points equally, what kind of information would a creel survey capture? We could ask our surveyor to split her eight-hour work day between both access points, but she’ll have to drive for 0.5 hours to get from one to another. Of course, that 0.5 hour of drive time will be a part of her work day so she’ll effectively have 7.5 hours to spend at access points counting anglers and collecting data.
start_time <- c(0, 4.5)
wait_time <- c(4, 3.5)
n_sites = 2
n_anglers <- c(50, 50)
fishing_day_length <- 12
# sampling_prob <- sum(wait_time)/fishing_day_length
sim <- conduct_multiple_surveys(n_sims = 20, start_time = start_time, wait_time = wait_time,
n_sites = n_sites, n_anglers = n_anglers,
mean_catch_rate = 2.5,
fishing_day_length = fishing_day_length)
sim
#> Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 747.0897 2.395948 781.2451 349.6246 2.411676
#> 2 1049.7223 1.765272 887.2324 351.9553 2.407386
#> 3 1001.8543 2.015416 860.2979 342.3451 2.486909
#> 4 939.5718 2.676889 845.5717 337.8541 2.480168
#> 5 836.0630 3.020539 842.3358 341.2702 2.428031
#> 6 1168.0913 2.110574 682.2155 324.0263 2.025390
#> 7 861.9932 2.231022 811.7799 331.9898 2.491920
#> 8 970.4394 2.133755 786.4610 319.8154 2.441453
#> 9 909.1530 2.293082 845.3929 348.6101 2.478829
#> 10 1076.2712 2.201420 790.6962 348.3613 2.314308
#> 11 1003.6039 2.961193 901.8364 336.2231 2.752778
#> 12 994.7747 2.093049 833.1171 346.3091 2.379673
#> 13 996.2269 2.234408 839.4745 356.3127 2.448600
#> 14 1020.5084 1.891334 832.9888 335.3736 2.443650
#> 15 843.4247 2.861870 771.0300 326.6157 2.437588
#> 16 936.2151 2.107805 799.7216 313.7693 2.565320
#> 17 947.0207 2.595076 801.8594 324.4572 2.488821
#> 18 974.9905 2.724325 753.4912 324.6000 2.405841
#> 19 1077.8457 2.430041 946.5520 368.0249 2.557546
#> 20 804.6284 3.158537 838.6308 326.1016 2.615504
#>
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -226.09 -69.24 20.17 51.98 227.50
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 527.989 579.589 0.911 0.374
#> true_effort 1.273 1.715 0.742 0.467
#>
#> Residual standard error: 104 on 18 degrees of freedom
#> Multiple R-squared: 0.02972, Adjusted R-squared: -0.02419
#> F-statistic: 0.5513 on 1 and 18 DF, p-value: 0.4674
Ultimately, the creel survey simulation can be as complicated as a creel survey. If a survey requires multiple clerks, several simulations can be coupled together to act as multiple surveyors. To accommodate weekends or holidays (i.e., increased angler pressure), additional simulations with different wait times and more anglers (to simulate higher pressure) can be built into the simulation. For example, if we know that angler pressure is 50% higher at the two access points on weekends, we can hire a second clerk to sample 8 hours a day on the weekends–one day at each access point–and add the weekend data to the weekday data.
#Weekend clerks
start_time_w <- 2
wait_time_w <- 10
n_sites <- 1
n_anglers_w <- 75
fishing_day_length <- 12
sampling_prob <- 8/12
sim_w <- conduct_multiple_surveys(n_sims = 8, start_time = start_time_w,
wait_time = wait_time_w, n_sites = n_sites,
n_anglers = n_anglers_w,
mean_catch_rate = 2.5,
fishing_day_length = fishing_day_length)
sim_w
#> Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 394.9863 2.523206 697.4651 276.2823 2.652318
#> 2 383.8820 2.436196 649.4526 266.5847 2.409292
#> 3 395.5739 2.755471 756.9392 274.7041 2.744176
#> 4 354.8569 2.734374 674.7273 247.3653 2.931638
#> 5 379.7376 2.622277 695.1953 265.0997 2.685504
#> 6 367.8180 2.804485 722.4389 256.8065 2.863010
#> 7 380.3324 2.454154 648.1905 264.1197 2.526219
#> 8 401.1485 2.456887 684.4281 278.5754 2.463277
#Add the weekday survey and weekend surveys to the same data frame
mon_survey <-
sim_w %>%
bind_rows(sim)
mod <-
mon_survey %>%
lm(Ehat ~ true_effort, data = .)
summary(mod)
#>
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -278.17 -77.48 -11.91 64.92 326.12
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1478.1192 207.6681 -7.118 1.47e-07 ***
#> true_effort 7.1602 0.6507 11.004 2.79e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 119.4 on 26 degrees of freedom
#> Multiple R-squared: 0.8232, Adjusted R-squared: 0.8164
#> F-statistic: 121.1 on 1 and 26 DF, p-value: 2.79e-11
Hopefully, this vignette has shown you how to build and simulate your
own creel survey. It’s flexible enough to estimate monthly or seasonal
changes in fishing day length, changes in the mean catch rate, increased
angler pressure on weekends, and any number of access sites, start
times, wait times, and sampling probabilities. The output from
conduct_multiple_surveys()
allows the user to estimate
variability in the catch and effort estimates (e.g., relative standard
error) to evaluate the most efficient creel survey for their
fishery.