Simulating Creel Surveys

Introduction

Creel surveys allow fisheries scientists and managers to collect data on catch and harvest, an angler population (including effort expended), and, depending on survey design, biological data on fish populations. Though important methods of collecting data on the user base of the fishery, creel surveys are difficult to implement and, in graduate fisheries programs, creel surveys are paid little attention. As a result, fisheries managers–the first job for many fisheries-program graduates–often inherit old surveys or are told to institute new surveys with little knowledge of how to do so.

Fisheries can cover large spatial extents: large reservoirs, coast-lines, and river systems. A creel survey has to be statistically valid, adaptable to the geographic challenges of the fishery, and cost efficient. Limited budgets can prevent agencies from implementing creel surveys; the AnglerCreelSurveySimulation was designed to help managers explore the type of creel survey that is most appropriate for their fishery, including fisheries with multiple access points, access points that are more popular than others, variation in catch rate, the number of surveyors, and seasonal variation in day-lengths.

The AnglerCreelSurveySimulation package does require that users know something about their fishery and the human dimensions of that fishery. A prior knowledge includes mean trip length for a party (or individual), the mean catch rate of the

The AnglerCreelSurveySimulation package is simple, but powerful. Four functions provide the means for users to create a population of anglers, limit the length of the fishing day to any value, and provide a mean trip length for the population. Ultimately, the user only needs to know the final function ConductMultipleSurveys but because I’d rather this not be a black box of functions, this brief introduction will be a step-by-step process through the package.

A walk-through of the package

This tutorial assumes that we have a very simple, small fishery with only one access point that, on any given day, is visited by 100 anglers. The fishing day length for our theoretical fishery is 12 hours (say, from 6 am to 6pm) and all anglers are required to have completed their trip by 6pm. Lastly, the mean trip length is known to be 3.5 hours.

For the purposes of this package, all times are functions of the fishing day. In other words, if a fishing day length is 12 hours (e.g., from 6 am to 6pm) and an angler starts their trip at 2 and ends at 4 that means that they started their trip at 8 am and ended at 10 am.

The make_anglers() function builds a population of anglers:


library(AnglerCreelSurveySimulation)

anglers <- make_anglers(n_anglers = 100, mean_trip_length = 3.5, fishing_day_length = 12)

make_anglers() returns a dataframe with start_time, trip_length, and departure_time for all anglers.


head(anglers)
#>   start_time trip_length departure_time
#> 1 0.02782571    5.583119      5.6109447
#> 2 0.33442276    0.526787      0.8612097
#> 3 7.62885286    1.859801      9.4886534
#> 4 0.60615188    2.747347      3.3534985
#> 5 4.94777591    1.880994      6.8287702
#> 6 6.99811419    2.463352      9.4614660

In the head(anglers) statement, you can see that starttime, triplength, and departureTime are all available for each angler. The first angler started their trip roughly 0.03 hours into the fishing day, continued to fish for 5.58 hours, and left the access point at 5.61 hours into the fishing day. Angler start times are assigned by the uniform distribution and trip lengths are assigned by the gamma distribution. To get true effort of all the anglers for this angler population, summing trip_length is all that’s needed: 0.

The distribution of angler trip lengths can be easily visualized:


library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

# Histogram overlaid with kernel density curve
anglers %>%
  ggplot(aes(x=trip_length)) + 
  geom_histogram(aes(y=..density..), 
                 binwidth=.1,
                 colour="black", fill="white") +
  geom_density(alpha=.2, fill="#FF6666")

Once the population of anglers has been created, the next function to apply is the get_total_values() function. In get_total_values(), the user specifies the start time of the creel surveyor, the end time of the surveyor, and the wait time of the surveyor. Here is where the user also specifies the sampling probability of the anglers (in most cases, equal to \(\frac{waitTime}{fishingDayLength}\)) and the mean catch rate of the fishery. There are a number of a default settings in the get_total_values() function; see ?get_total_values for a description of how the function handles NULL values for startTime, endTime, and waitTime. startTime and waitTime are the times that the surveyor started and waited at the access point. totalCatch and trueEffort are the total (or real) values for catch and effort. meanLambda is the mean catch rate for all anglers. Even though we assigned meanCatchRate to get_total_values(), individual mean catch rates are simulated by rgamma() with shape equal to meanCatchRate and rate equal to 1.

For this walk through, we’ll schedule the surveyor to work for a total of eight hours at the sole access point in our fishery:


anglers %>%
  get_total_values(start_time = 0, wait_time = 8, circuit_time  = 8, mean_catch_rate = 2.5, 
                   fishing_day_length = 12)
#>   n_observed_trips total_observed_trip_effort n_completed_trips
#> 1               90                   243.4483                57
#>   total_completed_trip_effort total_completed_trip_catch start_time wait_time
#> 1                    170.3483                   410.1601          0         8
#>   total_catch true_effort mean_lambda
#> 1    838.2736    314.7418    2.734755

get_total_values() returns a single row data frame with several columns. The output of get_total_values() is the catch and effort data observed by the surveyor during their wait at the access point along with the “true” values for catch and effort. (Obviously, we can’t simulate biological data but, if an agency’s protocol directed the surveyor to collect biological data, that could be analyzed with other R functions.)

In the output from get_total_values(), n_observed_trips is the number of trips that the surveyor observed, including anglers that arrived after she started her day and anglers that were there for the duration of her time at the access point. total_observed_trip_effort is the effort expended by those parties; because the observed trips were not complete, she did not count their catch. n_completed_trips is the number of anglers that completed their trips while she was onsite, total_completed_trip_effort is the effort expended by those anglers, and total_completed_trip_catch is the number of fish caught by those parties. Catch is both the number of fish harvested and those caught and released.

Estimating catch and effort

Effort and catch are estimated from the Bus Route Estimator:

\[ \widehat{E} = T\sum\limits_{i=1}^n{\frac{1}{w_{i}}}\sum\limits_{j=1}^m{\frac{e_{ij}}{\pi_{j}}} \]

where

  • E = estimated total party-hours of effort;
  • T = total time to complete a full circuit of the route, including traveling and waiting;
  • wi = waiting time at the ith site (where i = 1, …, n sites);

and

  • eij = total time that the jth car (or trailer) is parked at the ith site while the agent is at that site (where j = 1, …, n sites).

Catch rate is calculated from the Ratio of Means equation:

\[ \widehat{R_1} = \frac{\sum\limits_{i=1}^n{c_i/n}}{\sum\limits_{i=1}^n{L_i/n}} \]

where

  • ci is the catch for the ith sampling unit

and
* Li is the length of the fishing trip at the tie of the interview.

For incomplete surveys, Li represents an incomplete trip.

simulate_bus_route() calculates effort and catch based upon these equations. See ?simulate_bus_route for references that include a more detailed discussion of these equations.

simulate_bus_route() calls make_anglers() and get_total_values() so many of the same arguments we passed in the previous functions will need to be passed to simulate_bus_route(). The new argument, nsites, is the number of sites visited by the surveyor. In more advanced simulations (see the examples in ?simulate_bus_route), you can pass strings of values for startTime, waitTime, nsites, and nanglers to simulate a bus route-type survey rather than just a single access-point survey.


sim <- simulate_bus_route(start_time = 0, wait_time = 8, n_sites = 1, n_anglers = 100,
                          mean_catch_rate = 2.5, fishing_day_length = 12)

sim
#>       Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 254.1131       2.609279   849.2175    332.9894    2.554031

The output from simulate_bus_route() is a dataframe with values for Ehat, catchRateROM (the ratio of means catch rate), trueCatch, trueEffort, and meanLambda. Ehat is the estimated total effort from the Bus Route Estimator above and catchRateROM is catch rate estimated from the Ratio of Means equation. trueCatch, trueEffort, and meanLambda are the same as before. Multiplying Ehat by catchRateROM gives an estimate of total catch: 663.0519874.

Conducting multiple simulations

With information about the fishery, the start and wait times of the surveyor, the sampling probability, mean catch rate, and fishing day length, we can run multiple simulations with conduct_multiple_surveys(). conduct_multiple_surveys() is a wrapper that calls the other three functions in turn and compiles the values into a data frame for easy plotting or analysis. The only additional argument needed is the nsims value which tells the function how many simulations to conduct. For the sake of this simple simulation, let’s assume that the creel survey works five days a week for four weeks (i.e. 20 days):


sim <- conduct_multiple_surveys(n_sims = 20, start_time = 0, wait_time = 8, n_sites = 1,
                                n_anglers = 100, 
                                mean_catch_rate = 2.5, fishing_day_length = 12)

sim
#>        Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1  283.9901       2.624168   957.2048    371.8270    2.591832
#> 2  257.3739       2.274775   865.4242    359.7312    2.422283
#> 3  263.3519       2.531531   840.7083    341.1700    2.502158
#> 4  286.3035       2.636459   847.5561    358.2680    2.493317
#> 5  241.0059       2.091205   746.8896    320.3706    2.360773
#> 6  283.4753       2.035783   745.4144    365.9348    2.136867
#> 7  231.0606       2.425965   858.2096    341.9932    2.431244
#> 8  256.9581       2.441059   845.8885    346.7909    2.536444
#> 9  245.2889       2.665434   820.1221    326.3658    2.419219
#> 10 272.9459       2.622623   939.6975    362.7497    2.639500
#> 11 242.4129       2.455266   940.5788    341.4352    2.726100
#> 12 244.3567       2.472028   872.7614    339.9551    2.442740
#> 13 258.6222       2.548895   894.0612    348.0127    2.602706
#> 14 256.5682       3.472415  1068.3938    361.1228    2.832822
#> 15 249.7969       2.632862  1023.9092    359.4603    2.679270
#> 16 223.9953       3.015796   867.6795    311.0071    2.816511
#> 17 261.3882       2.189277   762.6096    351.2963    2.163427
#> 18 277.8554       2.414402   872.5733    351.8118    2.570118
#> 19 270.4510       2.458694   826.1457    352.3857    2.411958
#> 20 275.3377       2.704482   842.6900    332.3657    2.399357

With the output from multiple simulations, an analyst can evaluate how closely the creel survey they’ve designed mirrors reality. A lm() of estimated catch as a function of trueCatch can tell us if the survey will over or under estimate reality:


mod <- 
  sim %>% 
  lm((Ehat * catch_rate_ROM) ~ true_catch, data = .)

summary(mod)
#> 
#> Call:
#> lm(formula = (Ehat * catch_rate_ROM) ~ true_catch, data = .)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -107.88  -56.21   10.17   31.93  115.13 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)  62.3735   163.8891   0.381  0.70797   
#> true_catch    0.6812     0.1872   3.639  0.00188 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 67.61 on 18 degrees of freedom
#> Multiple R-squared:  0.4239, Adjusted R-squared:  0.3919 
#> F-statistic: 13.25 on 1 and 18 DF,  p-value: 0.001875

Plotting the data and the model provide a good visual means of evaluating how close our estimates are to reality:


#Create a new vector of the estimated effort multiplied by estimated catch rate
sim <- 
  sim %>%
  mutate(est_catch = Ehat * catch_rate_ROM)

sim %>% 
  ggplot(aes(x = true_catch, y = est_catch)) +
  geom_point() +
  geom_abline(intercept = mod$coefficients[1], slope = mod$coefficients[2], 
              colour = "red", size = 1.01)
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

The closer the slope parameter estimate is to 1 and the intercept parameter estimate is to 0, the closer our estimate of catch is to reality.

We can create a model and plot of our effort estimates, too:


mod <- 
  sim %>%
  lm(Ehat ~ true_effort, data = .)

summary(mod)
#> 
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -23.9880  -9.7125   0.3522   7.6868  27.8261 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -12.6866    66.9324  -0.190 0.851788    
#> true_effort   0.7829     0.1926   4.065 0.000727 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 13.23 on 18 degrees of freedom
#> Multiple R-squared:  0.4786, Adjusted R-squared:  0.4497 
#> F-statistic: 16.52 on 1 and 18 DF,  p-value: 0.0007267

#Create a new vector of the estimated effort multiplied by estimated catch rate

sim %>%
  ggplot(aes(x = true_effort, y = Ehat)) +
  geom_point() +
  geom_abline(intercept = mod$coefficients[1], slope = mod$coefficients[2], 
              colour = "red", size = 1.01)

Observing all trips

If the start and wait time equals 0 and the length of the fishing day, respectively, the creel surveyor can observe all completed trips, though she’d likely be unhappy having to work 12 hours. The inputs have to be adjusted to allow her to arrive at time 0, stay for all 12 hours, and have a probability of 1.0 at catching everyone:


start_time <- 0
wait_time <- 12
sampling_prob <- 1

sim <- conduct_multiple_surveys(n_sims = 20, start_time = start_time, wait_time = wait_time,
                                n_sites = 1, n_anglers = 100, 
                                mean_catch_rate = 2.5, fishing_day_length = wait_time)

sim
#>        Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1  338.4188       2.526191   854.9105    338.4188    2.492846
#> 2  335.3774       2.533732   849.7564    335.3774    2.662725
#> 3  322.1603       2.704045   871.1359    322.1603    2.683248
#> 4  332.8148       2.244621   747.0431    332.8148    2.211713
#> 5  344.1474       2.375886   817.6551    344.1474    2.504725
#> 6  330.3224       2.228176   736.0166    330.3224    2.327166
#> 7  351.7334       2.439408   858.0215    351.7334    2.428026
#> 8  350.9096       2.672565   937.8286    350.9096    2.691987
#> 9  317.5808       2.575393   817.8952    317.5808    2.586919
#> 10 343.8947       2.563955   881.7305    343.8947    2.565945
#> 11 305.1314       2.320839   708.1607    305.1314    2.409243
#> 12 347.5091       2.661144   924.7719    347.5091    2.552094
#> 13 324.4478       2.470399   801.5155    324.4478    2.453708
#> 14 305.9623       2.464212   753.9557    305.9623    2.464376
#> 15 341.9831       2.595890   887.7505    341.9831    2.469572
#> 16 340.1130       2.283058   776.4978    340.1130    2.316221
#> 17 336.3245       2.378349   799.8971    336.3245    2.438951
#> 18 340.3461       2.556218   869.9990    340.3461    2.511172
#> 19 359.5779       2.360543   848.7990    359.5779    2.399526
#> 20 345.1977       2.205940   761.4852    345.1977    2.363665
#> Warning in summary.lm(mod): essentially perfect fit: summary may be unreliable
#> 
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#> 
#> Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -1.847e-13  3.250e-16  1.398e-14  2.117e-14  7.051e-14 
#> 
#> Coefficients:
#>               Estimate Std. Error   t value Pr(>|t|)    
#> (Intercept) -2.542e-13  2.754e-13 -9.23e-01    0.368    
#> true_effort  1.000e+00  8.195e-16  1.22e+15   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 5.189e-14 on 18 degrees of freedom
#> Multiple R-squared:      1,  Adjusted R-squared:      1 
#> F-statistic: 1.489e+30 on 1 and 18 DF,  p-value: < 2.2e-16

More complex surveys

If our hypothetical fishery suddenly gained another access point and the original 100 anglers were split between the two access points equally, what kind of information would a creel survey capture? We could ask our surveyor to split her eight-hour work day between both access points, but she’ll have to drive for 0.5 hours to get from one to another. Of course, that 0.5 hour of drive time will be a part of her work day so she’ll effectively have 7.5 hours to spend at access points counting anglers and collecting data.


start_time <- c(0, 4.5)
wait_time <- c(4, 3.5)
n_sites = 2
n_anglers <- c(50, 50)
fishing_day_length <- 12
# sampling_prob <- sum(wait_time)/fishing_day_length

sim <- conduct_multiple_surveys(n_sims = 20, start_time = start_time, wait_time = wait_time,
                                n_sites = n_sites, n_anglers = n_anglers, 
                                mean_catch_rate = 2.5, 
                                fishing_day_length = fishing_day_length)

sim
#>         Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1  1249.4907       2.180083   828.6670    341.1333    2.408367
#> 2   866.1782       2.623012   738.3780    303.4722    2.491475
#> 3  1007.4818       2.081340   752.3658    320.7494    2.440070
#> 4  1099.9839       2.597177   852.4705    325.7298    2.557588
#> 5   923.0335       2.045153   895.2873    336.2774    2.582202
#> 6  1203.7533       2.392358   917.6269    360.8331    2.564378
#> 7  1037.8636       2.696484   912.5469    345.1254    2.702024
#> 8  1012.5147       2.777855   958.4809    350.9344    2.620072
#> 9   919.5822       2.982438   826.7231    352.0860    2.347174
#> 10  999.7040       2.516509   763.0466    317.6327    2.440649
#> 11 1076.5503       2.149722   831.2845    356.9924    2.316774
#> 12  877.6668       3.112014   876.8068    348.6745    2.507499
#> 13 1101.8551       2.378960   824.6176    350.8859    2.336300
#> 14 1054.7562       2.108180   768.8328    336.9540    2.289195
#> 15  998.6178       2.404390   816.3672    324.1411    2.502753
#> 16 1116.4112       2.412224   859.0256    350.7947    2.325477
#> 17  909.0606       3.029545   846.3970    324.6751    2.503748
#> 18 1034.8630       2.217336   869.4471    353.8973    2.529287
#> 19  974.7500       1.803500   778.4843    334.9460    2.426595
#> 20  907.3551       2.342040   709.5490    315.6153    2.200094
#> 
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -174.873  -47.208   -2.631   42.270  220.033 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  -14.701    451.845  -0.033   0.9744  
#> true_effort    3.061      1.337   2.289   0.0344 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 93.98 on 18 degrees of freedom
#> Multiple R-squared:  0.2255, Adjusted R-squared:  0.1825 
#> F-statistic: 5.241 on 1 and 18 DF,  p-value: 0.03437

Even more simulations

Ultimately, the creel survey simulation can be as complicated as a creel survey. If a survey requires multiple clerks, several simulations can be coupled together to act as multiple surveyors. To accommodate weekends or holidays (i.e., increased angler pressure), additional simulations with different wait times and more anglers (to simulate higher pressure) can be built into the simulation. For example, if we know that angler pressure is 50% higher at the two access points on weekends, we can hire a second clerk to sample 8 hours a day on the weekends–one day at each access point–and add the weekend data to the weekday data.


#Weekend clerks
start_time_w <- 2
wait_time_w <- 10
n_sites <- 1
n_anglers_w <- 75
fishing_day_length <- 12
sampling_prob <- 8/12

sim_w <- conduct_multiple_surveys(n_sims = 8, start_time = start_time_w, 
                                  wait_time = wait_time_w, n_sites = n_sites, 
                                  n_anglers = n_anglers_w, 
                                  mean_catch_rate = 2.5, 
                                  fishing_day_length = fishing_day_length)

sim_w
#>       Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 339.0032       2.373315   561.7161    236.2135    2.431810
#> 2 358.9239       2.818435   708.0025    251.7850    2.844418
#> 3 339.4585       2.168630   513.7869    239.0527    2.231830
#> 4 340.7627       2.849649   685.8428    240.4456    2.642581
#> 5 359.4840       2.714199   677.5771    249.6417    2.636589
#> 6 362.2580       2.436514   615.1291    252.7900    2.448829
#> 7 350.5191       2.848595   700.9075    246.6987    2.799248
#> 8 390.8831       2.607164   707.7058    271.4466    2.621865

#Add the weekday survey and weekend surveys to the same data frame
mon_survey <- 
  sim_w %>%
  bind_rows(sim)

mod <- 
  mon_survey %>% 
  lm(Ehat ~ true_effort, data = .)

summary(mod)
#> 
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -204.421  -57.684   -1.193   53.919  219.623 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -1332.3482   142.0247  -9.381 7.88e-10 ***
#> true_effort     6.9246     0.4508  15.360 1.48e-14 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 101.9 on 26 degrees of freedom
#> Multiple R-squared:  0.9007, Adjusted R-squared:  0.8969 
#> F-statistic: 235.9 on 1 and 26 DF,  p-value: 1.477e-14

Choose your own adventure

Hopefully, this vignette has shown you how to build and simulate your own creel survey. It’s flexible enough to estimate monthly or seasonal changes in fishing day length, changes in the mean catch rate, increased angler pressure on weekends, and any number of access sites, start times, wait times, and sampling probabilities. The output from conduct_multiple_surveys() allows the user to estimate variability in the catch and effort estimates (e.g., relative standard error) to evaluate the most efficient creel survey for their fishery.