Cooperative Computing Lab News

Monday, August 24, 2020

CCTools version 7.1.7 released

 

The Cooperative Computing Lab is pleased to announce the release of version 7.1.7 of the Cooperative Computing Tools including Parrot, Chirp, JX, Makeflow, WorkQueue, and other software.

The software may be downloaded here:
http://ccl.cse.nd.edu/community/forum


This is a bug release with some new features and bug fixes. Among them:

  • [Batch] Set number of MPI processes for SLURM. (Ben Tovar)
  • [General] Use the right signature when overriding gettimeofday. (Tim Shaffer)
  • [Resource Monitor] Add context-switch count to final summary. (Ben Tovar)
  • [Resource Monitor] Fix kbps to Mbps typo in final summary. (Ben Tovar)
  • [WorkQueue] Update example apps to python3. (Douglas Thain)

Thanks goes to the contributors for many features, bug fixes, and tests:

  • Ben Tovar
  • Cami Carballo
  • Douglas Thain
  • Nathaniel Kremer-Herman
  • Tanner Juedeman
  • Tim Shaffer

Please send any feedback to the CCTools discussion mailing list:

http://ccl.cse.nd.edu/community/forum


Enjoy!






Posted by Benjamin Tovar at 9:25 AM No comments:
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest

Friday, August 14, 2020

Resource usage histograms for Work Queue using python's pandas+matplotlib

Work Queue is a framework to write and execute master-worker applications. A master process that can be written in python, perl, or C generates the tasks that then can be remotely executed by worker processes. You can learn more about Work Queue here.

Work Queue can automatically measure the resources, such as cores, memory, disk, and network bandwidth, used by each task. In python, this is enabled as:

import work_queue as wq
q = wq.WorkQueue(port=9123)
q.enable_monitoring()


    The resources measured are available as part of the task structure:

# wait for 5 seconds for a task to finish
t = q.wait(5)
if t:
    print("Task {id} measured memory: {memory} MB" 
            .format(id=t.id, memory=t.resources_measured.memory))

    The resources measured are also written to Work Queue's transaction log. This log can be enabled when declaring the master's queue:

import work_queue as wq
q = wq.WorkQueue(port=9123, transactions_log='my_wq_trans.log')
q.enable_monitoring()

    This log is also generated by Makeflow when using Work Queue as a batch system (-Twq).

    The resource information per task appears as a json object in the transactions marked as DONE end-state exit-code resource-exhausted resources-measured. Here is an example of how a DONE transaction looks like:

1595431788501342 10489 TASK 1 DONE SUCCESS  0  {} {"cores": 2, ...}

    With a regular expression incantation, we can extract the resource information into python's pandas. Say, for example, that we are interested in the memory and bandwidth distribution among the executed tasks. We can read these resources as follows:

import json
import re
import pandas as pd
import matplotlib.pyplot as plt

# the list of the resources we are interested in
resources = 'memory bandwidth'.split()
df = pd.DataFrame(columns=resources)

input_file = 'my_wq_trans.log'

with open(input_file) as input:
    for line in input:
        # timestamp master-pid TASK id (continue next line)
        # DONE SUCCESS exit-code exceeded measured
        m = re.match('\d+\s+\d+\s+TASK\s+\d+\s+'
                     'DONE\s+SUCCESS\s+0\s+{}\s+({.*})\s*$', line)
        if not m:
            continue

        # the resources are captured in the first (only pair
        # of parentheses) group used:
        s = json.loads(m.group(1))

        # append the new resources to the panda's data frame.
        # Resources are represented in a json array as
        # [value, "unit", such as [1000, "MB"],
        # so we only take the first element of the array:
        df.loc[len(df)] = list(s[r][0] for r in resources)


    For a quick view, we can directly use panda's histogram method:

df.hist()
plt.show()
 
 

 However, we can use matplotlib's facilities for subfigures and add titles,
units, etc. to the histograms:

# size width x height in inches
fig = plt.figure(figsize=(5,2))

# 1 row, 2 columns, 1st figure of the array
mem = plt.subplot(121)
mem.set_title('memory in MB')
mem.set_ylabel('task count')
mem.hist(df['memory'], range=(0,100))

# 1 row, 2 columns, 2nd figure of the array
mbp = plt.subplot(122)
mbp.set_title('bandwidth in Mbps')
mbp.hist(df['bandwidth'], range=(0,1200))

fig.savefig(input_file + '.png') 
 
 
 
 
 
(credit: Python code highlight generated by: http://hilite.me)
 
Posted by Benjamin Tovar at 8:51 AM No comments:
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest

Thursday, August 13, 2020

Tim Shaffer Awarded DOE Fellowship

CCL grad student Tim Shaffer was recently awarded a DOE SCGSR fellowship for his work titled "Enabling Distributed HPC for Loosely‐Coupled Dataflow Applications".  He will be working with Ian Foster and Kyle Chard at Argonne National Lab on data intensive applications that combine the Parsl system from Argonne and the Work Queue runtime from Notre Dame.  Congratulations Tim!


Posted by Douglas Thain at 12:35 PM No comments:
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest

WRENCH Simulation of Work Queue

Our colleagues Henri Casanova (U Hawaii) and Rafael Ferreira da Silva (USC), along with their students, have recently published a paper highlighting their work in the WRENCH project.  The have constructed a series of simulators have model the behavior of distributed systems, for the purposes of both performance prediction and education.

In their paper "Developing accurate and scalable simulators of production workflow management systems with WRENCH" the describe simulators that correspond the the Pegasus workflow management system and our own Work Queue distributed execution framework.

Of course, any simulation is an imperfect approximation of a real system, but what's interesting about the WRENCH simulations is that they allow us to verify the basic assumptions and behavior of a software implementation.  In this example, the real system and the simulation show the same overall behavior, except that the real system has a stair-step behavior:


So, does that mean the simulation is "wrong"?  Not really!  In this case, the software is showing an undesirable behavior that is due either to incorrect logging or possibly a convoy effect.  In short, the simulation helps us to find a bug relative to the "ideal" design.  Nice!

https://www.sciencedirect.com/science/article/pii/S0167739X19317431

Posted by Douglas Thain at 12:20 PM No comments:
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Newer Posts Older Posts Home
Subscribe to: Posts (Atom)

About the CCL

At the University of Notre Dame, we design software that enables computing on thousands of machines at once in order to enable new discoveries through computing in fields such as physics, chemistry, bioinformatics, biometrics, and data mining.

See our main web site for software, publications, and much more information.

Labels

abstractions (9) active storage (1) allpairs (4) amazon (1) assembly language (1) atlas (1) audit trail (1) big data (1) bigdata (1) biocompute (2) biodiversity (1) bioinformatics (3) bxgrid (5) chirp (5) classify (1) cloud computing (4) cluster computing (2) cms (4) condor (16) confuga (1) cvmfs (4) cyclecomputing (1) data mining (2) deltadb (1) disc (1) distributed computing (8) elastic applications (1) elections (1) enavis (1) ethernet (1) eucalyptus (1) fault tolerance (2) forcebalance (1) genome assembly (2) green cloud (1) grid computing (5) grid heating (2) hadoop (3) hep (4) highlights (18) icecube (1) java (1) lidar (1) linking (1) linux (1) lobster (3) lockdown (1) log file (1) makeflow (15) map-reduce (2) mesos (1) molecular dynamics (4) mpi (1) multicore (1) osg (1) parallel (1) parrot (9) physics (4) processes (2) protomol (1) prune (1) release (2) replica exchange (1) reproducibility (2) resource monitor (1) reu (1) roars (1) shepherd (1) storage (1) threads (2) troubleshooting (3) turtles (1) umbrella (1) virtual machines (2) visualizing (3) wavefront (2) weaver (2) web browsers (1) work queue (15) workflow (1) workqueue (4) workshop (1)

Blog Archive

  • ►  2025 (2)
    • ►  January (2)
  • ►  2024 (10)
    • ►  December (2)
    • ►  November (2)
    • ►  October (3)
    • ►  March (2)
    • ►  February (1)
  • ►  2023 (11)
    • ►  November (3)
    • ►  October (1)
    • ►  July (2)
    • ►  April (2)
    • ►  February (3)
  • ►  2022 (11)
    • ►  October (1)
    • ►  September (1)
    • ►  August (5)
    • ►  July (1)
    • ►  June (1)
    • ►  February (2)
  • ►  2021 (12)
    • ►  December (2)
    • ►  November (3)
    • ►  August (3)
    • ►  July (1)
    • ►  April (2)
    • ►  February (1)
  • ▼  2020 (19)
    • ►  December (4)
    • ►  October (1)
    • ►  September (1)
    • ▼  August (4)
      • CCTools version 7.1.7 released
      • Resource usage histograms for Work Queue using pyt...
      • Tim Shaffer Awarded DOE Fellowship
      • WRENCH Simulation of Work Queue
    • ►  July (3)
    • ►  June (1)
    • ►  May (2)
    • ►  April (1)
    • ►  March (1)
    • ►  January (1)
  • ►  2019 (11)
    • ►  November (2)
    • ►  October (2)
    • ►  September (1)
    • ►  August (2)
    • ►  July (1)
    • ►  June (1)
    • ►  March (1)
    • ►  February (1)
  • ►  2018 (16)
    • ►  November (2)
    • ►  October (1)
    • ►  August (2)
    • ►  July (4)
    • ►  June (2)
    • ►  May (3)
    • ►  April (1)
    • ►  March (1)
  • ►  2017 (17)
    • ►  December (2)
    • ►  November (1)
    • ►  October (3)
    • ►  August (2)
    • ►  June (1)
    • ►  May (4)
    • ►  March (2)
    • ►  February (2)
  • ►  2016 (21)
    • ►  October (2)
    • ►  September (2)
    • ►  August (3)
    • ►  July (1)
    • ►  June (1)
    • ►  May (6)
    • ►  April (1)
    • ►  March (2)
    • ►  February (2)
    • ►  January (1)
  • ►  2015 (24)
    • ►  November (4)
    • ►  October (3)
    • ►  September (2)
    • ►  August (3)
    • ►  July (4)
    • ►  June (1)
    • ►  May (4)
    • ►  April (1)
    • ►  March (2)
  • ►  2014 (12)
    • ►  December (2)
    • ►  November (1)
    • ►  September (1)
    • ►  August (2)
    • ►  July (1)
    • ►  June (3)
    • ►  May (1)
    • ►  February (1)
  • ►  2013 (15)
    • ►  November (1)
    • ►  October (1)
    • ►  August (1)
    • ►  July (3)
    • ►  June (1)
    • ►  May (1)
    • ►  March (3)
    • ►  February (2)
    • ►  January (2)
  • ►  2012 (15)
    • ►  November (2)
    • ►  October (3)
    • ►  September (3)
    • ►  August (1)
    • ►  July (3)
    • ►  June (1)
    • ►  February (2)
  • ►  2011 (1)
    • ►  August (1)
  • ►  2010 (8)
    • ►  November (3)
    • ►  October (2)
    • ►  April (1)
    • ►  January (2)
  • ►  2009 (12)
    • ►  October (2)
    • ►  August (1)
    • ►  July (2)
    • ►  June (1)
    • ►  May (1)
    • ►  April (1)
    • ►  February (3)
    • ►  January (1)
  • ►  2008 (9)
    • ►  December (3)
    • ►  November (2)
    • ►  October (4)
Simple theme. Powered by Blogger.