Skip to content

jthayworth/pipeline-meter-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Pipeline Meter Scraper

This is a c# project that will scrape information from various different pipeline meter listings, transform the returned data into a generic model, and insert into database. This implementation is assuming a postgresql database.

Setup

The following setup is required to run this program locally:

  1. Dotnet 8 SDK
  2. VS Code
  3. C# VS Code Extension
  4. .NET Install Tool VS Code Extension

Job Settings

Each job requires the following environment variables to be set. These variables would either go in a yaml file if you are running this in a docker container for a cronjob or a .env file for running locally.

Generic Variables

Variable Name Description
DB_CONNECTION Base connection string for the database
DB_USERNAME The username for authenticating into the database
DB_PASSWORD The password for authenticating into the database
CHROME_PATH The path to the chrome app/exe. IN chrome go to chrome://version to find the Executable Path

Job Specific Variables

Variable Name Description
JOB_TYPE The type of job to run. This points to a specific runner
JOB_RESOURCE_URL The URL of the page where a download button can be clicked
JOB_DOWNLOADED_NAME The filename with extension (.csv)
JOB_RETRIES The number of times you want to try the job before failing
JOB_RETRY_INTERVAL The amount of time in seconds to wait in between each attempt

How This Works

Runners

Each site with a unique UI will require a runner to be created. See KinderMorganRunner for an example.

Models

Each runner will require it's own model with annotations for the CSV parser. See KinderMorganRaw.cs. You can name the properties of the model however you want, but you will need to annotate them.

Annotation Example Description
Name [Name("NAME_OF_COLUMN_IN_CSV")] This tells the CSV parser which column in the data to map to the property in the class
Ignore [Ignore] This tells the CSV parser to ignore the property when mapping the data to the class

TableItem Model

This model probably isn't fully stubbed out, I just did the properties I thought were important, feel free to add more if you need them. Within this class, there should be a method for each runner model that takes an item of that type and fills out the TableItem class with the values from that type. See TableItem.cs for an example.

Dependency Injection

Once a runner and model have been created, you need to add the runner to the dependency injection setup in the ConfigureServices section of Program.cs so it can be used.

  services.AddKeyedScoped<IRunner, YOUR_RUNNER>("YOUR_RUNNER_NAME");

Tip

The name of the service supplied in the dependency injection is the value that will be used for the JOB_TYPE environment variable For example, the KinderMorganRunner name is "kinder-morgan".

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages