These are some research projects I've worked on:

  • Prosody Quantization

  • Q: Text-to-speech (TTS) often sounds very mechanical. One reason why TTS sounds robotic is due to poor prosody modeling. Prosody is a collection of auditory and acoustic characteristics of speech at the syllable/word/utterance level: duration, volume, pitch, stress, rhythm, etc. Often, TTS adopts a simple prosody model. Can TTS sound more natural if we overlay text with prosody from actual human speech?

    A: This project involved tagging speech with a system of annotations denoting a speaker's pitch and emphasis called Rhythm and Pitch (RaP). Manual annotation is expensive, so we were interested in determining the feasibility of automatic annotation. Feature vectors encoding prosody information about each spoken syllable were classified by a neural network. We then clustered the vectors based on their predicted classes and performed vector quantization to approximate each vector by its cluster's centroid. We resynthesized the speech based on the prosody encoded by these centroid points. We found that the quality of the resynthesized speech usually matched the quality of the input speech. This indicates that speech samples can be automatically labeled with the RaP annotations, which reduces the cost associated with manual annotation.

    Related Papers:
    The RaP (Rhythm and Pitch) Labeling System (Dilley and Brown, 2005)
    AuToBI - A Tool for Automatic ToBI annotation (Rosenberg, 2010)
    Automatic Labelling of Pitch Levels and Pitch Movements in Speech Corpora (Mertens, 2013)
    Extraction and Representation of Prosodic Features for Language and Speaker Recognition (Mary, 2011)
  • Cache Replacement Policies

  • Q: Some threads process data that is more critical than other threads' data. Our critical data comes from a simulated real-time, rate-monotonic process. Real-time systems have physical time constraints associated with their data. For example, pressure and altitude measurements from an airplane become stale after a short period of time, and so new measurements are required. How can we favor critical data in cache so that we don't have to go to main memory as often to retrieve it?

    A: In this work, we investigated two replacement policies for caches that contain data of two varieties--critical and non-critical--with the aim of improving cache performance (measured in miss counts) for the critical data while not harming the performance for non-critical data. Both of these policies were compared against the baseline performance of the Least Critical (LC) policy, which will only evict non-critical data.

    The first policy, Decay Ratio (DR), assigns a counter to each cache line that is decremented according to the criticality of the line. The cache evicts the line with the smallest counter value. DR allows critical data to stay resident in cache longer than non-critical data and also avoids the issue of LC wherein critical data fills up the cache and is never evicted. The second policy, Random Flagging (RF) is a stochastic approximation of DR that avoids the use of counters. The overhead is instead one flag bit (in addition to the criticality bit present in all three policies). RF approximates combining the gradual counter decrements into one large jump. We found that both DR and RF perform better than LC on most cache sizes and associativities.

    Related Papers:
    A Survey on Cache Management Mechanisms for Real-Time Embedded Systems (Gracioli et al, 2015)
    Cache Design for Mixed Criticality Real-Time Systems (Kumar et al, 2014)
    Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment (Liu and Layland, 1973)
    Tightening the Bounds on Feasible Preemptions (Ramaprasad and Mueller, 2006)
  • Binary Integer Programming

  • Q: Binary integer programs (BIPs) are constrained linear optimization problems in which each variable takes either the value 0 or 1. BIPs have applications such as diet planning: eat a food (1) or not (0). However, following (or not) the plan throughout the day (i.e. setting some variables' values) can change the nutritional requirements of the diet plan for the rest of the day. How is the performance of BIP solvers affected by incrementally constraining variables to take on certain values?

    A: We generated BIPs based on food nutritional data and various popular diet plans and ran them on solvers implemented by MATLAB and Octave. These original BIP formulations were then incrementally constrained and re-run through the solvers. For each run, we determined the execution time and solution quality. Although the behavior of the times became erratic when approximately 80% of the variables were constrained, we found that in general, the execution times decreased as more variables were constrained because there were fewer free variables, even though there were more constraints to handle. We saw similar results when the variables ("foods") were clustered to reduce the dimensionality of the problem.

    Related Papers:
    Progress in Linear Programming-Based Algorithms for Integer Programming: An Exposition (Johnson et al, 2000)
    Gradient Methods for Binary Integer Programming (Huang and Wang, 2011)