Monday, June 25, 2012

TimeMachine Scheduler Tour: Part7


This is part 7 of 7 in a series of articles that will give you a tour of the TimeMachine Scheduler project. These articles will introduce you to the scheduler, how to load jobs and schedules, and explore some of its advanced features. For the most current and accurate instructions, please visit the ReferenceManual from the project site.


Running Multiple Schedulers in a Clustering Mode

One of TimeMachine Scheduler goal is to be scalable and run high number of jobs. One of the way to do this is be able to run multiple schedulers on separate JVM under a single logical scheduler data space. In previous articles, you might already have noticed that our data models were designed to support multiple scheduler data space from the begining. This feature is actually already implicitly enabled by default, and you do not have to do much to take advantage of it!


When running multiple schedulers (usually on separated JVM, but not required), your scheduler configuration is still all the same, except you need to pay attention to these two properties:



timemachine.scheduler.schedulerName = TimeMachineScheduler

timemachine.scheduler.nodeName = #{hostname}



In order to keep all data in a single logical scheduler, your schedulerName must be unique among all other nodes configuration files that belong in same cluster. Within the cluster, each nodeName must be unique. In fact, as default, the schedulerNode is default to your hostname already. So if you are running each scheduler on separate machines, you automatically will join into the default logical scheduler named "TimeMachineScheduler" with your hostname as node name.





Using HibernateDataStore

A typical and common way to run scalable data store is to use a database persistence. Our HibernateDataStore would let you run all clustered scheduler configuration and store the data in a database of your choice (well as many as Hibernate would support). Each scheduler node would record itself during start and stop with host IP and timestamp and etc. And each JobDef and Schedule are store per each logical SchedulerData, so the namespace is already in place. Each scheduler node would execute whatever schedule that's next run is due in a first "poll" first "run" fashion. If no schedules to be run, then the node would just be idling.


Using MemoryDataStore

Our MemoryDataStore implementation actually supports multiple scheduler as well! But it's not enabled as default. The default config is to use a new instance of the
MemoryDataStore for data space, and thus the data will be lost and reset per scheduler start/stop. But if you add this config property:

timemachine.scheduler.dataStore.memoryDataStore.useSingleton = true

This would make the MemoryDataStore service to use a singleton instance of the MemoryDataStore for multiple schedulers to store data, thus making it cluster enable as well. This would be handy if you want to explore Big Memory or Data environment.



Summary

This would conclude our tour with the TimeMachine Scheduler. We hope these articles have given you helpful information to explore more. Our goals are to provide a scheduler that can scale well, able to run high concurrent jobs, allow flexible schedules, and easy to configure. We love to hear your feedback. Please visit the project site and join the user forum to participate.


End of part 7. You may see previous tour.

Saturday, June 23, 2012

TimeMachine Scheduler Tour: Part6


This is part 6 of 7 in a series of articles that will give you a tour of the TimeMachine Scheduler project. These articles will introduce you to the scheduler, how to load jobs and schedules, and explore some of its advanced features. For the most current and accurate instructions, please visit the ReferenceManual from the project site.


Configuring Multiple ThreadPools

By now you know how to write your own JobTask and even write your own Schedule implementation in TimeMachine Scheduler. We will switch back to configuration for a bit to talk about how to control the job execution.

By default the scheduler will have two thread pools. The first one is reserved for system services, and default to only 1 fixed thread pool (used by PoolingScheduleRunner). The second thread pool is default to 4 dynamic threads exclusively for running JobTask only. Here is how the default config looks like for these pools:

# System service thread pool (you only need one pool!)
timemachine.scheduler.systemThreadPool.class = timemachine.scheduler.service.FixedSizeThreadPool
timemachine.scheduler.systemThreadPool.maxSize = 1
timemachine.scheduler.systemThreadPool.threadNamePrefix = ${timemachine.scheduler.schedulerName}-System-Thread-


# Default jobTask thread pool (you may define more than one pool!)
timemachine.scheduler.jobTaskThreadPool.DEFAULT.class = timemachine.scheduler.service.DynamicThreadPool
timemachine.scheduler.jobTaskThreadPool.DEFAULT.minSize = 0
timemachine.scheduler.jobTaskThreadPool.DEFAULT.maxSize = 4
timemachine.scheduler.jobTaskThreadPool.DEFAULT.timeToLive = 300000
timemachine.scheduler.jobTaskThreadPool.DEFAULT.useShutdownNow = false
timemachine.scheduler.jobTaskThreadPool.DEFAULT.maxShutdownWaitTime = 1000
timemachine.scheduler.jobTaskThreadPool.DEFAULT.threadNamePrefix = ${timemachine.scheduler.schedulerName}-JobTask-Thread-

As you use the scheduler for more jobs, you might run into situation where you want to create multiple thread pools to run certain specific JobTask's. In this case, you want to configure certain jobs that would only run in a isolated threads pool. The TimeMachine Scheduler has this feature that you create multiple thread pools, and it allow you to match to job task's name. When you do this, you would also need to create a JobTaskPoolNameResolver that would resolve JobTask's name match to one of the thread pool you configured. Here is an example of scheduler configuration file that exercise this:


# Extra job tasks thread pool
timemachine.scheduler.jobTaskThreadPool.MYPOOL2.class = timemachine.scheduler.service.DynamicThreadPool
timemachine.scheduler.jobTaskThreadPool.MYPOOL2.maxSize = 4
timemachine.scheduler.jobTaskThreadPool.MYPOOL2.threadNamePrefix = MYPOOL2-Thread-


# Extra job tasks thread pool
timemachine.scheduler.jobTaskThreadPool.MYPOOL3.class = timemachine.scheduler.service.DynamicThreadPool
timemachine.scheduler.jobTaskThreadPool.MYPOOL3.maxSize = 4
timemachine.scheduler.jobTaskThreadPool.MYPOOL3.threadNamePrefix = MYPOOL3-Thread-


# Resolving multiple jobTask thread pools
timemachine.scheduler.jobTaskPoolNameResolver.poolName.MYPOOL2.matchToJobNameRexp = MyJobType2.*
timemachine.scheduler.jobTaskPoolNameResolver.poolName.MYPOOL3.matchToJobNameRexp = MyJobType3.*



The name to pool matching is done using the Java regular expression. The example above setup two set of job task names match to each of their pool instance. Any job names starting with MyJobType2 will be executed by MYPOOL2, while any starting with MyJobType3 will be executed by MYPOOL3. And finally if any JobTask name that doesn't match will use the DEFAULT pool.

Note that JobTask's name is only optional (only ID is required and it's auto generated), so to use this features, you want to ensure to set the JobTask's name that match your configured pool, or else they all default back to single DEFAULT pool.


End of part 6. You may continue next tour, or see previous tour.


Thursday, June 21, 2012

TimeMachine Scheduler Tour: Part5


This is part 5 of 7 in a series of articles that will give you a tour of the TimeMachine Scheduler project. These articles will introduce you to the scheduler, how to load jobs and schedules, and explore some of its advanced features. For the most current and accurate instructions, please visit the ReferenceManual from the project site.



How to create Custom Schedule

We have shown you how to create custom JobTask in previous tour. In most cases, you would write a custom JobTask and then pick one of built-in Schedule to run it. The TimeMachine Scheduler currently provides 3 built-in Schedule's: CronSchedule, RepeatSchedule and DateListSchedule. So what happen if these are not want you wanted, and you need special scheduling pattern? You can certainly write and extend the timemachine.scheduler.Schedule class and provide all the needed methods. But writing such Schedule implementation is much harder. Not only you would need to fully understand the base class, you would also need to deal with the persistence side in the DataStore; saving and re-load the states of your new Schedule implementation. In case of HibernateDataStore, you would also need to add a new entity mapping file etc. This is a lot of work to create a customized Schedule to run in a scheduler. Fortunately we provide something better and easier.


Our solution is in the DateListSchedule. By default this schedule only let you set a list of dates explicitly to run. The scheduler would simply run on those specified dates and times, and when the schedule has reached at the end of the list, it's done. The Schedule will be mark it as completed and remove it after the last job task has been run. 

However, there is another usage of DateListSchedule, that is to use a DateListProvider to supply a new dates list whenever the Schedule has reached the end of the list. You simply need to set a DateListProvider implementation class name in the schedule instance.



Writing and using DateListProvider

Let's say you want a job to be run on every midnight end of month. We will show you how this can be done with our DateListSchdule.

First, write a class that implements timemachine.scheduler.DateListProvider interface that will return last day of the month each time it's called, like this:


package schedulerdemo;
import java.util*;
import timemachine.scheduler.*;
import timemachine.scheduler.schedule.*;
public class MyEndOfMonthDateListProvider implements DateListSchedule.DateListProvider {
  public List<Date> getDateList(DateListSchedule schedule) {
    List<Date> result = new ArrayList<Date>();
    Date prevDate = schedule.getPrevRun();
    if (prevDate == null)
      result.add(Schedules.endOfMonth(Schedules.time("00:00:00")));
    else
      result.add(Schedules.endOfMonth(Schedules.addMonths(prevDate, 1)));
    return result;
  }
}



Noticed that we take care to use prevRun date as starting point to calculate the next last-day-of-Month. And if prevRun is null, then we know that it's the first time it's called, so we need to create an initial date first. Again, our Schedules utility class can be a great help when handling with Java dates calculation.


Now you can use above class in a DateListSchedule to schedule any job def. For example, you may schedule a job in a user service during init of the scheduler like this:

package schedulerdemo;
import timemachine.scheduler.*;
import timemachine.scheduler.schedule.*;
import timemachine.scheduler.support.*;
public class MyService extends AbstractService implements SchedulerListener {
  private Scheduler scheduler;
  public void onScheduler(Scheduler scheduler) { this.scheduler = scheduler; }
  public void init() {
    DateListSchedule schedule = new DateListSchedule();
    schedule.setDateListProviderClassName(MyEndOfMonthDateListProvider.class);
    JobDef jobDef = JobDefs.groovyJobDef("logger.info('Hello')");
    jobDef.addSchedule(schedule);
    scheduler.schedule(jobDef);
  }
}


Next, run the scheduler with the following config file:

timemachine.scheduler.userservice.myService.class = schedulerdemo.MyService




Using ScriptingDateListProvider

With the same concept as above, we also provided a built-in ScriptingDateListProvider class that let you create custom date list with Scripting. This allow you to write custom schedule without even recompiling a Java project!

In order to support  ScriptingDateListProvider, the DateListSchedule has another field that can be set using setDateListProviderData() method. This field is a string of data map in the key=value,key2=value2 format. We need this information to tell what scriptEngineName to use, and the scriptText to be executed (or scriptFile).


To mimic above example again , we are going to switch to Groovy scripting completly, even writing the user service in Groovy initScript, so no Java compile is needed. 


First create a scheduler config file like this:


timemachine.scheduler.userservice.myScriptService.class = timemachine.scheduler.userservice.ScriptingService
ScriptingService.scriptEngineName = Groovy
ScriptingService.initScript = config/init.groovy


Now create the config/init.groovy initScript file:


import timemachine.scheduler.*
import timemachine.scheduler.schedule.*
import timemachine.scheduler.support.*
schedule = new DateListSchedule()
schedule.setDateListProviderClassName(ScriptingDateListProvider.class)
schedule.setDataListProviderData('''
  scriptEngineName=Groovy,scriptText=
    import timemachine.scheduler.*
    prevDate = dateListSchedule.getPrevRun()
    if (prevDate == null)
      [Schedules.endOfMonth(Schedules.time("00:00:00"))]
    else
      [Schedules.endOfMonth(Schedules.addMonths(prevDate, 1))]
''')
jobDef = JobDefs.groovyJobDef("logger.info('Hello')")
jobDef.addSchedule(schedule)
scheduler.schedule(jobDef)


There, you just experienced a little bit of script within script! Pretty cool huh? Go ahead and re-start your scheduler with above config and you shall see your custom schedule in action.

As you can see, our DateListSchedule can give you a very flexible way to customize any schedule needs, even with dynamic scripting. Since this DateListSchedule is already part of the built-in schedule, all the persistence layer would work correctly without modifying any classes nor database structure.


End of part 5. You may continue next tour, or see previous tour.

Monday, June 18, 2012

TimeMachine Scheduler Tour: Part4



This is part 4 of 7 in a series of articles that will give you a tour of the TimeMachine Scheduler project. These articles will introduce you to the scheduler, how to load jobs and schedules, and explore some of its advanced features. For the most current and accurate instructions, please visit the ReferenceManual from the project site.



Developing with TimeMachine scheduler in Java

The TimeMachine scheduler is written in Java, so the primary language to extend and write custom job task is with Java as well. The obvious benefit of using Java over a Scripting language is it's speed and IDE tooling when developing.

Recall from previous tour that the scheduler allows you to create a job definition and add any schedules to be run. The actual job execution is provided by a JobTask implementation class name given to the job definition. You may write your own JobTask implementation in Java. After this, then you may write a user service that will register the job task with the scheduler. Through this user service layer, you can also implement event listeners that get invoked when scheduler runs a job, add a schedule, or delete a job def etc.

Before we start, let me show you a Java project setup using Maven3 so that you can use the rest of the tour as demo.

Let start by setup a maven project scheduler-demo/pom.xml file like this:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>scheduler-demo</groupId>
<artifactId>scheduler-demo</artifactId>
<version>1.0.0-SNAPSHOT</version>


<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
</plugins>
</build>


<dependencies>
<dependency>
<groupId>org.bitbucket.timemachine</groupId>
<artifactId>timemachine-scheduler</artifactId>
<version>1.1.1</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.6.1</version>
</dependency>
</dependencies>
</project>


The timemachine scheduler artifact should already in Maven central, so cd into the scheduler-demo directory and run mvn install should get your project compiled and installed into your local repository. If you are curious, you may also run mvn dependency:tree to see what are the scheduler dependencies are. You will discover that although the scheduler uses many optional runtime dependencies, the actual compile time dependencies only have few.



Writing JobTask in Java

Now create a new src/main/java/schedulerdemo/MyTask.java Java file with following:

package schedulerdemo;
import timemachine.scheduler.*;
import org.slf4j.*;
public class MyJobTask implements JobTask {
  public static Logger logger = LoggerFactory.getLogger( MyJobTask.class);
  public void run(JobContext jobContext) {
    logger.info("Hello, I am jobTask with " + jobContext.getSchedule());
  }
}

The JobTask interface is a very simple one, and the JobContext parameter would give you all the runtime information you need to query and interact with the scheduler. With above, you can immediately run the scheduler server within your maven project setup. But let's also create a scheduler  config/scheduler.properties file that looks like this first:

# config/scheduler.properties
timemachine.scheduler.userservice.jobLoader.class = timemachine.scheduler.userservice.JobLoaderService
JobLoaderService.01myJob.schedulerdemo.MyJobTask = CronSchedule{expression=* * * * * ?}

Now you can run the scheduler with your config by using this maven command:

$ mvn exec:java -Dexec.mainClass=timemachine.scheduler.tool.SchedulerServer -Dexec.args=config/scheduler.properties



Writing User Service in Java

Continue with the setup above, you may explore more advanced features of the scheduler. The scheduler API exposes a simple way to let you customize the scheduler. Recall that the scheduler application itself is a container of many system services. The scheduler also has a separate container that holds user services only. To register, you just need to implements the timemachine.scheduler.Service interface.

Try create a new  src/main/java/schedulerdemo/MyService.java Java file with following:

package schedulerdemo;
import timemachine.scheduler.*;
import org.slf4j.*;
public class MyService implements Service {
  public static Logger logger = LoggerFactory.getLogger( MyService.class);
  public void init() { logger.info("I am initializing."); }
  public void start() {}
  public void stop() {}
  public void destroy() {}
  public void isInited() { return true; }
  public void isStarted() { return true; }
  public String getName() { return "MyService"; }
}

We also provide a convenient timemachine.scheduler.support.AbstractService class so you may extend it instead. With the abstract class you only need to override the method you interested, and it auto manage the isInited() and isStarted() states correctly for you.

With your service ready, you may register it to scheduler with following scheduler config properties appended from earlier:

# config/scheduler.properties
timemachine.scheduler.userservice.jobLoader.class = timemachine.scheduler.userservice.JobLoaderService
JobLoaderService.01myJob.schedulerdemo.MyJobTask = CronSchedule{expression=* * * * * ?}
timemachine.scheduler.userservice.myService.class = schedulerdemo.MyService

Now re-start your scheduler again, and you should see your service initialized with log output to verify.

The above service implementation would not do much because you don't have a reference to the scheduler to setup or do anything. To obtains this, you simply implements timemachine.scheduler.SchedulerListener interface to your existing service class. With that you will have a reference to the scheduler that's fully initialized already. You may pre-setup jobs or manipulate the scheduler any way you want in your init() or start() method once you save the scheduler reference.

Besides the SchedulerListener, there is also JobListener, ConfigPropsListener, or CoreServiceListener you may use in the same mannerThe JobListener would provide all the events callback methods you would typically want to monitor the scheduler. Since there are many methods to implements, there is a SchedulerListener adaptor class that's ready for you to extends as well.

In  this tour, I have introduced and setup a Java Maven based project for you to explore the TimeMachine Scheduler. Go ahead and give these API a try and let us know what you think. If there is any features you are looking for that's not in current scheduler, then please help file an Issue in the project site. We will be glad to evaluate and look forward to improve the project with you.


End of part 4. You may continue next tour, or see previous tour.

Thursday, June 14, 2012

TimeMachine Scheduler Tour: Part3

This is part 3 of 7 in a series of articles that will give you a tour of the TimeMachine Scheduler project. These articles will introduce you to the scheduler, how to load jobs and schedules, and explore some of its advanced features. For the most current and accurate instructions, please visit the ReferenceManual from the project site.


Scheduler Data Models

The main API entry to TimeMachine Scheduler is the timemachine.scheduler.Scheduler interface. Our default Scheduler implementation is simply a container that hosts many Service's, and one of the system service is reponsible for storing data. Before I cover more on these system service functionalities, I should introduce to you on the data models that the store service will use.

There four major data models that we persist and managed in the scheduler. They are listed here.
  • SchedulerData - Represents a logical scheduler. It has an id and a name. A logical scheduler may have one or more physical ScheduleNode.
  • SchedulerNode - Represents a physical scheduler instance that runs on a JVM server node. It has an id, name, hostname, IP address, start time, and stop time etc.
  • JobDef - A job definition has all the information about a job to be run in a scheduler that belong to a SchedulerData. A JobDef will have an id, an optional name, job task class name, and zero or more Schedule associated with it. A job definition may also contains a map of String properties store data that's specific for that job task.
  • Schedule- A schedule has all the information to tell when and how often a JobDef's job task will be run. It must exists under a JobDef instance. Besides some common properties, there are 3 specific sub-classes of Schedule that we store separately: CronSchedule, RepeatSchedule, and DateListSchedule. They all share some comon fields such as id, name, startTime, nextRun datetime, missedRunPolicy etc. But they each also have their own additional fields for their specific function as well.
Note that the scheduler DataStore system service will auto generate ID value for each instance of model to be store. You can always uniquely identified an model object by it's ID value. Both JobDef and Schedule's name is optional and only used to help user perform search by a string name.


How the Scheduler Works 

When the scheduler starts, it will first create and initialize a SchedulerNode. Each SchedulerNode must belong to a SchedulerData. If this logical SchedulerData doesn't exists yet, then it will be auto created, else it will use the existing one. Recall that in our scheduler config file, you have the option to set schedulerName and nodeName, and these two values will uniquecally identify the node instance.

Next the scheduler will initialize and execute the ScheduleRunner system service that will check the DataStore for any job definitions (JobDef) to be run. If they exists, then it checks to see if they have Schedule associated. For each Schedule that belong to a JobDef, it will then check for nextRun datetime. When it's time to run, the runner service will instanciate the JobDef's jobTaskClassName object dynamically at runtime, and invoke it's run() method. That's when the actual job's task, or work, begins.

Note that JobDef do not store the task instance directly, but only the class name instead. This is so we can scale and store many job defintions instead of the in-memory objects. The creation of the job task object is at runtime, and you may completely control it by override the JobTaskFactory service.

Before a JobDef's Schedule is to be run, the scheduler also track and update it's states. It will go from WAITING to STAGING to RUNNING, and then back to WAITING. Through the Scheduler interface, you may also pause or resume each Schedule individually. If a scheduler is paused, it will not be polled for job task run.

A Schedule may also support a missedRunPolicy that tells scheduler how to handle in case when nextRun has missed the time to run. When it passes the max missed run interval allowed, which is configurable, the scheduler will use this policy value to determine what to do. The default policy is to simply skip to the current date time and continue again. However when this happens, we record and increase this Schedule's missedRunCount value so you may keep watch of it. Obvously we also track the normal Schedule.runCount as well.

In a nutshell, that's about how the scheduler works internally with these data models. We have a very flexible API in managing our scheduler system services, and we also allow user to make custom services to be register with the scheduler. All of the system services have well defined interfaces, and you are allow to swap any implementation you wish. For example we provide MemoryDataStore and HiberanteDataStore services that you may choose on how to persist your data. All these are configurable through the simple scheduler properties file. We shall cover some of these settings in future tour.


End of part 3. You may continue next tour, or see previous tour.

Monday, June 11, 2012

TimeMachine Scheduler Tour: Part2

This is part 2 of 7 in a series of articles that will give you a tour of the TimeMachine Scheduler project. These articles will introduce you to the scheduler, how to load jobs and schedules, and explore some of its advanced features. For the most current and accurate instructions, please visit the ReferenceManual from the project site.



Scripting the Scheduler with Groovy

Scripting language is a great way to extend an application, and with Java 6 or higher it has ScriptingEngine API baked right in. There are many solid JVM based scripting engines available out there today. For example Groovy, Ruby or Jython are just few popular open source ones. The TimeMachine Scheduler embraced the easy and flexibility of scripting. I will be covering some of these features in this tour.

Starting JVM 6 or higher, it already comes with JavaScript engine implementation, and there is no external dependency with this. So TimeMachine has default to use "JavaScript" as scripting engine. You may add any other script engine jars in to the "lib" directory and specify the scriptEngineName parameter to change it.

We have found the Groovy scripting engine to be very productive, and its syntax are very similar to Java language itself, but yet very concise and expressive. So we decided to make TimeMachine distribution zip file pre-packaged the Groovy jars for user convenient. (Note that Groovy is only an optional dependency for TimeMachine scheduler itself, and we have properly set our maven pom.xml as such.)

All the demo code in this tour will use Groovy. You are free to choose other engine if you want to explore it further.


The ScriptingService

You may initialize the scheduler along with a script file and let it execute and prepare jobs or anything you would need before the scheduler is started. You will start by create a config/scheduler.properties file like this:

timemachine.scheduler.userservice.scriptingService.class = timemachine.scheduler.userservice.ScriptingService
ScriptingService.scriptEngineName = Groovy
ScriptingService.initScript = config/myscript.groovy

In your config/myscript.groovy file, you may try this:

logger.info("Hello World!")
logger.info("I have access to " + scheduler)

Now you can fire off the scheduler:

$ bin/scheduler.sh config/scheduler.properties

You should see the scheduler started with the hello world message printed on log output.


The Scheduler API

From above  you can see that we give you two variables to play with in the script. The logger is simple one, and you probably don't do anything more than logging info message. The more interesting one is the scheduler variable. This variable would have full access to the scheduler; it  is an instance of  timemachine.scheduler.Scheduler class. Let's use this variable to create a cron job in the following Groovy initScript:

import timemachine.scheduler.*
import timemachine.scheduler.schedule.*
import timemachine.scheduler.jobtask.*

jobDef = new JobDef()
jobDef.setJobTaskClass(LoggerJobTask.class)


schedule = new CronSchedule()
schedule.setExpression("* * * * * ?")
jobDef.addSchedule(schedule)


scheduler.schedule(jobDef)

The scheduler API is pretty self explanatory, but let me be more explicit to help along. We imported all the packages and classes that we need first, then we created a job definition object. We told it what task to do and how often to do it. We created 3 schedules/jobs that will run the task. We finally scheduled and stored this job definition to the scheduler. These jobs will run according to the schedule (every second) as soon as your scheduler starts. You may verify through the output log.

For convenience sake, we also provide factory classes that can make above program even shorter.


import timemachine.scheduler.*
jobDef = JobDefs.loggerJobDef()
jobDef.add(Schedules.cron("* * * * * ?"))
scheduler.schedule(jobDef)




Schedule Types

Besides the CronSchedule, we also have RepeatSchedule and DateListSchedule schedule types. We have created a nice factory methods in Schedules that return one of these schedule. For example, we may create a minutely repeat schedule and an explicit date list schedules in the initScript like this: 

import timemachine.scheduler.*
jobDef = JobDefs.osCommandJobDef("cmd.exe /c echo 'Hello World.'")
jobDef.addSchedule(Schedules.minutely(5))
jobDef.addSchedule(Schedules.datelist(*[Schedules.datetime("01/01/2013 08:00:00"), Schedules.datetime("01/01/2014 08:00:00")]))
scheduler.schedule(jobDef)


In above, we have scheduled one job definition with two schedules to run. First one runs every 5 mins, and the second one runs twice on an explicit given dates.


Note: The asterisk in front of left bracket is needed due to Groovy syntax on passing an list object into Java's wildcard variable argument.


JobTask Types

Besides the LoggerJobTask and OsCommandJobTask built-in JobTask you have seen above, we also have a powerful ScriptingJobTask that let you build a job task in Groovy code! This means you may add a new job without even compiling Java code! Here is an example of Groovy initScript script that will create a new "scripting" job.

import timemachine.scheduler.*

jobDef = JobDefs.groovyJobDef('''
file= new File("/tmp/counter.data")
if (!file.exists())
  num = 1
else
   num  = file.text.toInteger() + 1
logger.info("Incrementing counter $num in $file")
file.write(num)
''')
jobDef.addSchedule(Schedules.secondly(1)) // run every second.
scheduler.schedule(jobDef)

In above example, we created a job that runs every second. The job task will increment a counter in a file and re-save it every time the job runs.


Pretty Groovy ...

There you go. Above is your first custom job in TimeMachine Scheduler! The Groovy language is very similar to Java in syntax, yet minus all the noises, so it's very productive. Groovy also access and integrate with existing Java API seamlessly, so you may access and control the scheduler with easy. 

Interested? Go and download the scheduler today and give it a try!


End of part 2. You may continue next tour, or see previous tour.

Thursday, June 7, 2012

TimeMachine Scheduler Tour: Part1

This is part 1 of 7 in a series of articles that will give you a tour of the TimeMachine Scheduler project. These articles will introduce you to the scheduler, how to load jobs and schedules, and explore some of its advanced features. For the most current and accurate instructions, please visit the ReferenceManual from the project site.


What is TimeMachine Scheduler

TimeMachine is a Java scheduler that can scale and run high volume of jobs with many different types of schedules, such as repeating on fixed interval or based on CRON expressions. The scheduler may control the job executions with thread pools, and it can persist job data into different storage. Users may use the built-in scheduler server with easy configuration file, or developers may use it as a library to extend the scheduler and write custom jobs, schedules, or user services.


Getting started

First step is to download the latest scheduler distribution and uznip it into your system. Then fire up the scheduler with some sample jobs.

We will be printing commands and its output running on a MacOSX terminal. If you have a Microsoft Windows, then running a Cygwin terminal would also work. Or if you are Linux user then you just use a Terminal.

(NOTE: If you don't feel like downloading software, you may try our online demo. You may edit the scheduler configuration directly in a web form, and it will restart the scheduler immediately upon Save.)

If you have downloaded the zip file under an "apps" directory in your HOME folder, then follow these steps to get a scheduler instance running:

$ cd $HOME/apps
$ unzip timemachine-scheduler-1.1.1.zip
$ cd  timemachine-scheduler- 1.1.1
$ bin/scheduler.sh config/scheduler.properties


You should see some log output on the terminal console screen like this





22:57:08 main INFO| TimeScheduler system services initialized: [
  scheduler: SchedulerData[id=1, name=TimeMachineScheduler],
  schedulerNode: SchedulerNode[nodeId=1, name=ZEMIANs-iMac.local, ip=192.168.1.130],
  configProps: config/scheduler.properties,
  dataStore: MemoryDataStore[name=386981384],
  scheduleRunner: PollingScheduleRunner[name=1186906970],
  classLoader: SimpleClassLoaderService[name=1363910379],
  jobTaskFactory: SimpleJobTaskFactory[name=621450213],
  jobTaskPoolNameResolver: SimpleJobTaskPoolNameResolver[name=1945442111],
  jobTaskThreadPool: DynamicThreadPool[name=jobTaskThreadPool.DEFAULT],
]
22:57:08 main INFO| Scheduler[id=1, nodeId=1, nodeIp=192.168.1.130] initialized. Version=1.1.1.062720122255
22:57:08 main INFO| Scheduler[id=1, nodeId=1, nodeIp=192.168.1.130] started.




You may hit CTRL+C to exit the scheduler. The above config/scheduler.properties configuration file would not do much other than load an empty scheduler. To see more in action, try the config/crontab.properties config instead. It should look something like this:


timemachine.scheduler.userservice.crontabService.class = timemachine.scheduler.userservice.CrontabService
CrontabService.01 = 0 0 * * * ?        | sh -c echo "Hourly task begins."
CrontabService.02 = 0/5 * * * * ?      | sh -c echo "Heart beat."
CrontabService.03 = 0 0/5 * * * ?      | sh -c echo "Five minutes job."
CrontabService.04 = 0 0 12 * JAN,JUN ? | sh -c echo "We should clean up every 6 months."
CrontabService.05 = 0 0 8 ? * 1-5      | sh -c echo "Every workday at 8AM."


The above configuration will make the scheduler to work similar to the Unix crontab service. It let you input a CRON expression, and then follow by a OS executable command to be run. Go ahead, try to replace the "echo" command in the config file with any other commands that you know of (ping for example), and then restart the scheduler. Our default log settings should display all the external command's output as it execute.

There are few more configuration entries that you may add to customize the scheduler. Here are few we will examine closely. For example:

timemachine.scheduler.schedulerName = TimeMachineScheduler
timemachine.scheduler.nodeName = 
ZEMIANs-iMac.local
timemachine.scheduler.dataStore.class = timemachine.scheduler.service.MemoryDataStore
timemachine.scheduler.jobTaskThreadPool.DEFAULT.class = timemachine.scheduler.service.DynamicThreadPool
timemachine.scheduler.jobTaskThreadPool.DEFAULT.minSize = 0
timemachine.scheduler.jobTaskThreadPool.DEFAULT.maxSize = 4
timemachine.scheduler.jobTaskThreadPool.DEFAULT.threadNamePrefix = ${timemachine.scheduler.schedulerName}-JobTask-Thread-


You are free to change the scheduler name. Both the schedulerName and nodeName together will form a unique name for this instance of scheduler. These are printed as you start the scheduler so you can verify and identify them.

We allow you to switch to different datastore, and we are using a in-memory store in this case. We also  provide a HibernateDataStore that you may use to persist the data into a database of your choice. We will cover this later in the tour, but for now we will focus on the simple in-memory store.

One benefit of using our scheduler over a typical Unix cron is that you may control the thread pool to execute your jobs. You see the last few lines of configuration that defined a dynamic thread pool (it will not create the threads if your scheduler is idle without jobs to run.) You may change the min and max pool size, and you may even change the thread name. You may see the thread name in any JDK management tool such as jvisualvm.

Notice one feature of our scheduler configuration, it allow you to substitude an existing value with ${key} format! We use this to set our thread name that reuse the value you already set as schedulerName.


End of part 1. You may continue next tour.

Saturday, June 2, 2012

The HibernateDataStore preview is available

One of the reason I delayed the first release of TimeMachine Scheduler is that I started the HibernateDataStore implementation while polishing the scheduler API. This gave me chance to look ahead on what I need, as this is the major component in the coming 1.1.0 release. The initial Hibernate impl is actually working now! I have made an snapshot of the latest code today, and you may get a preview on how it works. Get the 1.1.0-SNAPSHOT here:
https://bitbucket.org/timemachine/scheduler/downloads

I also started a wiki doc on how to use it here:
https://bitbucket.org/timemachine/scheduler/wiki/HibernateDataStoreConfig

So give it a try and let me know what you think so far!