Reading from a GTFS file

The GTFS standard defines the format in which a GTFS file is shared. It consists in a bunch of CSV files within a zip file.

import com.mobimeo.gtfs.file._
import com.mobimeo.gtfs.model._

import cats.effect._
import cats.effect.unsafe.implicits.global

import fs2.io.file.Path

val gtfs = GtfsFile[IO](Path("site/gtfs.zip"))
// gtfs: Resource[IO, GtfsFile[IO]] = Bind(
//   source = Allocate(
//     resource = cats.effect.kernel.Resource$$$Lambda$16478/0x0000000804362840@5112a487
//   ),
//   fs = cats.effect.kernel.Resource$$Lambda$16480/0x0000000804364040@756a17a6
// )

The acquired GTFS resource gives access to the content under the read namespace. The content is streamed entity by entity. This way the files are never entirely loaded into memory when reading them. The read namespace exposes function to read from the standard files, for instance if one wants to read the available route names from a GTFS file, on can use the routes function as follows. Note that it uses the provided data model.

gtfs.use { gtfs =>
  gtfs.read
    .routes[ExtendedRoute].collect {
      case Route(id, _, Some(name), _, _, _, _, _, _, _) => s"$name ($id)"
    }
    .intersperse("\n")
    .evalMap(s => IO(print(s)))
    .compile
    .drain
}.unsafeRunSync()
// U2 (17514_400)
// M5 (17459_900)
// 123 (17304_700)

The read namespace contains shortcuts to read entities from the standard files. You need to provide the type you want to decode the entities to (in this example ExtendedRoute, which is the route entity using extended route types). You can provide your own type, provided that you also provide a CsvRowDecoder for that type.

For instance if you are only interested in extracting route name and identifier, you can define you own data model for these two fields.

import fs2.data.csv.CsvRowDecoder
import fs2.data.csv.generic.CsvName
import fs2.data.csv.generic.semiauto._

case class IdNameRoute(
  @CsvName("route_id") id: String,
  @CsvName("route_short_name") name: Option[String])
object IdNameRoute {
  implicit val decoder: CsvRowDecoder[IdNameRoute, String] = deriveCsvRowDecoder
}

gtfs.use { gtfs =>
  gtfs.read
    .routes[IdNameRoute].collect {
      case IdNameRoute(id, Some(name)) => s"$name ($id)"
    }
    .intersperse("\n")
    .evalMap(s => IO(print(s)))
    .compile
    .drain
}.unsafeRunSync()
// U2 (17514_400)
// M5 (17459_900)
// 123 (17304_700)

The simplest way to get the proper decoder for your own case classes is to use the fs2-data generic module as shown in the example above.

Non standard files

If you want to access files that are not part of the GTFS standard, you can use the file function, which takes the file name.

Note: The file has to be a valid CSV file.

For instance, to access a contributors.txt file that would list the contributors of the file, you can use this function.

case class Contributor(name: String, email: String)
object Contributor {
  implicit val decoder: CsvRowDecoder[Contributor, String] = deriveCsvRowDecoder
}

gtfs.use { gtfs =>
  gtfs.read
    .file[Contributor]("contributors.txt").map {
      case Contributor(name, email) => s"$name ($email)"
    }
    .intersperse("\n")
    .evalMap(s => IO(print(s)))
    .compile
    .drain
}.unsafeRunSync()
// VBB (info@vbb.de)
// Mobimeo (opensource@mobimeo.com)

Raw rows

For some usage, you might not want to deserialize the rows to a typed data model, but want to work with raw CSV rows from the files. This is useful for instance in case you want to modify the values of a field without validating or needing to know what the other fields contain. The GtfsFile class provides a raw variant for every file access. For instance, if you want to extract the route names without deserializing, you can use this approach.

gtfs.use { gtfs =>
  gtfs.read
    .rawRoutes
    .map(s => s("route_short_name"))
    .unNone
    .intersperse("\n")
    .evalMap(s => IO(print(s)))
    .compile
    .drain
}.unsafeRunSync()
// U2
// M5
// 123