GTFS rule engine
To use the GTFS rule engine, add the dependency to your build file. For instance for sbt:
libraryDependencies += "com.mobimeo" %% "fs2-gtfs-rules" % "<version>"
It is cross compiled for scala 2.13 and scala 3.
The GTFS rule engine provides a declarative way of describing checks and transformations of GTFS data. This is useful if you need to verify and normalize your GTFS data before using them (e.g. in a pre-processing pipeline).
The idea of this module is to provide a clear DSL to handle the data in a declarative way. To understand the rationale behind this module, you can read the blog post series we published.
The rules
The rules are grouped in sets. A rule set applies to a given file in the GTFS data (e.g. routes.txt
) and defines a list of rules.
For each row in the file, the rules are tried in order. The first matching one is taken an its associated action is executed. If no rule matches for the current row, then the row is left unchanged.
Note: the semantics for rules in a rule set is similar to the on of cases in a pattern match. The order matters as only the first matching one gets selected.
Rules are composed of two parts:
- A matcher, which defines which rows this rule applies to.
- An action, which defines the action to perform when a row is selected by the matcher.
We can for instance define a rule set that makes the station name uppercase:
import com.mobimeo.gtfs._
import com.mobimeo.gtfs.rules._
import cats.effect._
import cats.data.NonEmptyList
import cats.syntax.all._
val rules =
RuleSet(
StandardName.Stops.entryName,
List(
Rule(
"uppercase-stops",
Matcher.Any,
Action.Transform(
NonEmptyList.one(
Transformation.Set[IO](
Value.Str("stop_name"),
Expr.NamedFunction(
"uppercase",
List(Expr.Val(Value.Field(Value.Str("stop_name")))))))))),
Nil)
// rules: RuleSet[IO[A]] = RuleSet(
// file = "stops.txt",
// rules = List(
// Rule(
// name = "uppercase-stops",
// matcher = Any,
// action = Transform(
// transformations = NonEmptyList(
// head = Set(
// field = Str(value = "stop_name"),
// to = NamedFunction(
// name = "uppercase",
// args = List(Val(v = Field(name = Str(value = "stop_name"))))
// )
// ),
// tail = List()
// )
// )
// )
// ),
// additions = List()
// )
As you can see, this is not the easiest way to define the rules, that’s why the library also provides an embedded DSL and an external DSL to help write them in a more readable way.
Create the engine
The base class to know to run rules on your data is the Engine
that lives in the com.mobimeo.gtfs.rules
package.
import org.typelevel.log4cats.slf4j.Slf4jLogger
// this is unsafe in production code, please refer to the log4cats documentation
implicit val unsafeLogger = Slf4jLogger.getLogger[IO]
// unsafeLogger: org.typelevel.log4cats.SelfAwareStructuredLogger[IO] = org.typelevel.log4cats.slf4j.internal.Slf4jLoggerInternal$Slf4jLogger@74f0cdc0
val engine = Engine[IO]
// engine: Engine[IO] = com.mobimeo.gtfs.rules.Engine@6f6b66dd
An engine can be reused with different sets of rules and GTFS files.
Execute the rules
Once you have an engine and rules, you can apply them to GTFS data using the process
function.
import cats.effect.unsafe.implicits.global
import com.mobimeo.gtfs.file.GtfsFile
import fs2.io.file._
val gtfs = GtfsFile[IO](Path("site/gtfs.zip"))
// gtfs: Resource[IO, GtfsFile[IO]] = Bind(
// source = Allocate(
// resource = cats.effect.kernel.Resource$$$Lambda$16478/0x0000000804362840@6b9fbec4
// ),
// fs = cats.effect.kernel.Resource$$Lambda$16480/0x0000000804364040@1973f0ba
// )
val modified = Path("site/modified-rules-gtfs.zip")
// modified: Path = site/modified-rules-gtfs.zip
gtfs.use { src =>
src.copyTo(modified, CopyFlags(CopyFlag.ReplaceExisting)).use { tgt =>
engine.process(List(rules), src, tgt)
}
}.unsafeRunSync()
def printStops(gtfs: GtfsFile[IO]) =
gtfs.read
.rawStops
.map(s => s("stop_name"))
.unNone
.take(5)
.intersperse("\n")
.evalMap(s => IO(print(s)))
.compile
.drain
// original file
gtfs.use(printStops(_)).unsafeRunSync()
// S+U Berlin Hauptbahnhof
// S+U Berlin Hauptbahnhof
// Berlin, Friedrich-Olbricht-Damm/Saatwinkler Damm
// Berlin, Stieffring
// Berlin, Lehrter Str./Invalidenstr.
// modified file
GtfsFile[IO](modified).use(printStops(_)).unsafeRunSync()
// S+U BERLIN HAUPTBAHNHOF
// S+U BERLIN HAUPTBAHNHOF
// BERLIN, FRIEDRICH-OLBRICHT-DAMM/SAATWINKLER DAMM
// BERLIN, STIEFFRING
// BERLIN, LEHRTER STR./INVALIDENSTR.
All rule sets provided to the process
function are run in order. You can have several rule set applying to the same file, all of them will be applied to the file in the order they are defined.
Default functions
The library provides a set of standard functions you can use from the rules. These are available in Interpreter.defaultFunctions
. The function available by default are:
lowercase: (str) -> str
Makes the argument lowercase
uppercase: (str) -> str
Makes the argument uppercase
trim: (str) -> str
Trims the argument
concat: (str*) -> str
Concatenates all arguments