Labels

Monday, June 27, 2016

[Linux] chmod

Chomd in linux

If you are familiar with unix-like system, you are definitely using a lot the command chmod to change the access permission in a file or directories.
Like this one:
$ chmod -R 777 myDirectory
means recursively adds read, write and execution permission for owner, group and the others. -R means add these permissions on all files and all sub-directories in myDirecotories.
For the series of digits 777, the first 7 means adds read, write and execution permission for the file's owner; the second 7 means adds read, write and execution permission for users who are members of the file's group; the last 7 means adds read, write and execution permission for users who are neither the file's owner nor members of the file's group.
Besides digit 7, linux has other digits to indicate different permissions. In fact these digits are octal numbers
#Permissionrwx
7read, write and executerwx
6read and writerw-
5read and executer-x
4read onlyr--
3write and execute-wx
2write only-w-
1execute only--x
0none---
(Source: wiki https://en.wikipedia.org/wiki/Chmod)

Monday, May 30, 2016

[Java] Use Markov chain to generate text

Generate text based on a Markov chain

Can we write a method to generate a text based on the texts that we have already seen ? The answer is yes, in fact, we can use a Markov chain to generate text based on texts that we used to train the Markov chain. In mathematique, we use Markov chain to describe a random process on a state space and it has a famous property that the probability distribution of transition to the next state depends only on the current state and not on the sequence of events that before it, which is called "memorylessness".
So we can split our task in two steps, training and generating:
  • In training step, we use texts to build a Markov chain
  • In generating step, we start from the first state of the Markov chain and transist to next state following the probability distribution of transition in each state untill the final state.
For the aim of simplicity, the interface of MarkovTextGenerator is given as below
public interface MarkovTextGenerator {

    /** Train the generator by adding the sourceText */
    public void train(String sourceText);

    /** Generate the text with the specified number of words */
    public String generateText(int numWords);

    /** Retrain the generator from scratch on the source text */
    public void retrain(String sourceText);
}
we can see that, training and generating steps correspond to method train and generateText separately, besides, retrain is used to retraining the existing Markov chain.

An implementation of MarkovTextGenerator

My implementation of the interface MarkovTextGenerator during the project can be found here
Besides, we have several test cases in the main method of MarkovTextGeneratorLoL class which gives us an intuitive way to know what exactly the Markov chain is playing in text generating step.

[Java] LinkedList

An introduction to LinkedList and ArrayList in Java

LinkedList and ArrayList are probably the two most commonly used built-in data structures in Java. Both of them implemented the List interface which extends the interface Collection.
  • LinkedList is a double linked list, we can do sequential access and add a new node in the middle of the list in O(1) time, however, it doesn't have a index for random access
  • ArrayList is a resizable list, we can do random access in O(1) time, however, when we want to add or delete an element in ArrayList, it can be less performance than LinkedList

The code snippet of an implementation of double linked list

The snippet below shows the LinkedList that I've implemented during the project can be found here
First of all, in order to implement a double linked list, we need to define a node class which stores datad and has a pointer to its previous and next node. Then we have a header node and tail node in MyLinkedList class, which serve as a pointer to point the first and the last node in the list. Alternatively, we can use the header node and tail node as guard nodes whose next and previous point to the first and last node in the list separately. In the same time, we see that, a double linked list has the following basic methods:
  • Appends an element to the end of the list
public boolean add(E element ) 
  • Add an element to the list at the specified index
public void add(int index, E element )
  • Remove a node at the specified index and return its data element.
public E remove(int index) 
  • Get the element at position index
public E get(int index)
  • Set an index position in the list to a new element
public E set(int index, E element) 
  • Return the size of the list
public int size() 
Also we have the toString() method, these methods permit the basic manipulation of a linked list. Compared the linked list that I implemented by myself, the bulit-in LinkedList class in Java is far more complex as it has implemented several other interfaces like the Queue interface. That is to say, when we need a queue structure, we can initialize a queue that contains integer by the following method
Queue<Integer> q = new LinkedList<Integer>()
Then we can use the bulit-in method of queue, like add(E e)peek()poll() and remove()
Instead, the following declearation of a queue is not correct
Queue<Integer> q = new LinkedList()

Thursday, May 12, 2016

[Java] JUnit basics

A small introduction to JUnit in Eclipse

There is a simple way to do unit test for Java in Eclipse, that is JUnit. The first time that I used JUnit, I was amazed by its simplicity and efficiency. As part of the assignment of the project Fil Rough in the module INF 301, I've wrote test cases to see whether units of the back-end of a betting system works well. At the same time, our professor also use the JUnit as a way to measure the results our solution. Before I learned how to use JUnit, writing a main method after the class that I want to test was really a repetitive and boring task. However, with the help of JUnit, I can write all the test case in a same place, what's more, I can have more control over the code that I've written.

JUnit in Eclipse

Firstly, to use JUnit, JUnit packages needs to be loaded, so
import org.junit
needs to be written as first line in our test class. Below is a concrete example of how we use JUnit to do unit test.
import java.util.LinkedList;

import org.junit.*;

public class MyLinkedListTester {

    private static final int LONG_LIST_LENGTH =10;
    MyLinkedList<Integer> emptyList;
    MyLinkedList<Integer> longerList;

    /**
     * run every time before each test
     */
    @Before
    public void setUp() throws Exception {
        // Initialize 2 list for the following test
        emptyList = new MyLinkedList<Integer>();
        longerList = new MyLinkedList<Integer>();
        for (int i = 0; i < LONG_LIST_LENGTH; i++)
        {
            longerList.add(i);
        }
    }

    /**
     * run only one time before test
     */
    @BeforeClass
    public void setUpClass()
    {
    }

    /** Test if the get method is working correctly.
     */
    /*You should not need to add much to this method.
     * We provide it as an example of a thorough test. */
    @Test
    public void testGet()
    {
        //test empty list, get should throw an exception
        try {
            emptyList.get(0);
            fail("Check out of bounds");
            //System.out.println("Check out of bounds" + " - failed");
        }
        // test longer list contents
        for(int i = 0; i<LONG_LIST_LENGTH; i++ ) {
            assertEquals("Check "+i+ " element", (Integer)i, longerList.get(i));
        }
    }

    /** Test removing an element from the list.
     * We've included the example from the concept challenge.
     * You will want to add more tests.  */
    @Test
    public void testRemove()
    {
    }

    /** Test adding an element into the end of the list, specifically
     *  public boolean add(E element)
     * */
    @Test
    public void testAddEnd()
    {
    }

    /** Test the size of the list */
    @Test
    public void testSize()
    {
    }

    /** Test adding an element into the list at a specified index,
     * specifically:
     * public void add(int index, E element)
     * */
    @Test
    public void testAddAtIndex()
    {
    }

    /** Test setting an element in the list */
    @Test
    public void testSet()
    {
    }

    /**
     * run every time after each test
     */
    @After
    public void tearDown()
    {
    }

    /**
     * run every only one time after test
     */
    @AfterClass
    public void tearDownClass()
    {
    }

}
For the basic use of JUnit, 5 annotations needs to be learned, namely @Before, @BeforeClass, @test, @After and @AfterClass. Here are the signatures
@Before
public void setUp() throws Exception
{
    // is run before each test to initialize variables and objects
}

@BeforeClass
public static void setUpBeforeClass() throws Exception
{
    // is run only once before the class to initialize objects
}

@test
public void test<feature>()
{
    /* <feature> denotes a feature that we want to test
    * if we want to test Remove method of the class
    * the method name can be testRemove()
    */

}

@After
public void tearDown() throws Exception
{
    /* This method can be useful if your test constructed something which needs
    * to be properly torn down, for example, the database
    */
}

@AfterClass
public static void tearDownAfterClass() throws Exception
{
    /* This method can be useful if your setUpClass() method constructed something
    *  which needs to be properly torn down, for example, the database
    */
}
We notice that @Before, @BeforeClass, @After and @AfterClass are help method that help us to setup and clean the test environment, the most important is @test, there are serveral ways to test a method of a class, for the basic use of JUnit, we need at least know the following 2 method.

To test corner cases

we can test corner cases by using fail() method, here the method that we want to test is emptyList.get(0);. As we know, we can not get element from an empty list, emptyList.get(0); should throw an exception. However, if no exceptions are thrown during this method, fail() method would be called to let us know the emptyList.get(0); must has something wrong .
try {
            emptyList.get(0);
            fail("Check out of bounds");
            //System.out.println("Check out of bounds" + " - failed");
        }
        catch (IndexOutOfBoundsException e) {
            //System.out.println("Check out of bounds" + " - passed");

To test common cases

we can use assertEquals() to test common cases, assertEquals() takes 3 input, the first is a Sting, test message can be written here to let us know which test is runned, the second is expected value, the third is actual value that we get from the method that we want to test.
// test longer list contents
        for(int i = 0; i<LONG_LIST_LENGTH; i++ ) {
            assertEquals("Check "+i+ " element", (Integer)i, longerList.get(i));
        }
So in this way, we can test if the output of a certain method meets our expectation, if the expected value doesn't equal to the actual value, we can get a failure from JUnit control board. The signature of assertEquals() are shown below
assertEquals(java.lang.String message, long expected, long actual)
                Asserts that two longs are equal

Sunday, May 1, 2016

[Java] Basics of Regular Expression

A little Introduction to Regular Expression

Regular Expression is an important concept in Computer Science, I remember that in the course Logique et langages, my professor has already taught me how to use regular expressions to determine whether an input string meets a certain standard. For example, in the course project Systèm de paris sportifs, I had to write a module to determine whether the password of a new registered user contains only alphabets and numbers. If the password meets the standard, the account of the user will be created, if not, a warning message will be sent to the user. Finally, I had written a regular Expression like [a-zA-Z]+. Obviously this is not a practical answer, especially when we compare it with real world password restrictions like that of in social network which require a password must contain 8 characters and at least one number, one letter and one unique character.

A simple way to spilt words from a sentence

The simplest way to split words from a string like "This is simple test of split string." is to use the built in method in String data type, namely, .split(String regex). The method takes in regular expression, characters that match the regular expression will be used as delimiters to split the string. For example, if we use .split(" "), the string will be split every time there is a space, as shown below.

code snippet

public class SplitStringTest {
    public static void main(String args[]) {
        String test = "This is simple test of split string.";
        String words[] = test.split(" ");
        for (String word: words) {
            System.out.println(word);
        }
    }
}

output

This
is
simple
test
of
split
string.

An advanced method to spilt a string

The built-in method .split(String regex) can split a sentence if we correctly give the delimiter like a white space. However, what if more than one white spaces can be found between the words in that sentence? Like "This is simple test of split string.", then the output will be like below
This

is
simple
test
of
split
string.
Clearly, the second output is not what we want. We can solve this problem by further process the output string array. However, we have a more general method to solve this kinds of problem. That is to directly use the regular expression to find strings that we want to match, but we need to follow the following steps:
  1. create a String object and put the regular expression in it;
  2. create a Pattern object and use the class method of Pattern class Pattern.compile(String regex) to transform the regular expression into the Pattern object;
  3. create a Matcher object and use the instance method of Pattern class .matcher(Sting text) which will create a Matcherobject that will match the given input against this pattern;
  4. create an array of String to store words that match the regular expression by using the instance method of Matcher class.group() that returns the matched string once at a time and using the .find() method which is used as a loop condition like .hasNext() for an ArrayList

code snippet

import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {
    public static void main(String args[]) {
        String text = "This is a simple regex test.";
        ArrayList<String> words = new ArrayList<String>();
        String regexExpression = "[a-zA-Z]+";
        Pattern pattern = Pattern.compile(regexExpression);
        Matcher match = pattern.matcher(text);
        while (match.find()) {
            words.add(match.group());
        }
        for (String word: words) {
            System.out.println(word);
        }
    }
}

output

This
is
a
simple
regex
test

More about the Regular Expression

There are three ways to combine a basic regular expression: 1. Repetition: A matches character AA+ matches one or more character AA matches zero or more character A. 2. Concatenation: AB matches character AB 3. Alternation: A|B matches character A or character B
Besides
  • [] matches "anything in the set". Ex: [abc] matches a, b or c
  • [^] matches "anything NOT in the set". Ex: [^abc] matches Any character except a, b, or c
It's not easy to remember all the regular expressions, a reference document is necessary when we write this kinds of expressions, here is a detailed list of all supported regular expressions in Java SE 7
A website that can check your regular expressions can be found here, it's really pratical if we want to check if the regular expression fits our needs.

Friday, April 15, 2016

GitHub可视化工具

GitHub可视化工具

作为版本管理仓库中的龙头老大,GitHub有着千万量级的Project,实际上,由于Project Open DataDocker以及科技巨头们的青睐,在2013年的时候,GitHub上Repository的托管数量就已经突破千万了。详细见下表
图片来源:https://github.com/blog/1724-10-million-repositories
在如数众多的Project中,尤其是开发周期长达几年的Project中提取我们感兴趣的信息是很困难难的。比如直观的了解一个Project的生命周期,有多少contributer在什么时间做了多少contribution。
GitHub的可视化工具可以帮我们做到这一点,最近发现了一个有意思的GitHub可视化工具,输入待分析的GitHub用户名后可以看到该用户下所有公共Respository,并且根据使用语言进行聚类可视化

接下来,我们可以选择一个感兴趣的Respository继续进行分析,比如我们在这里选中Docker


点击播放就可以看到根据时间轴的Project动态变化了,比如什么时候有什么人加入了近来,做了什么贡献等等


然后发现Docker中的不少贡献者都是国人欸,比较亮点是发现了不少华为员工^_^

Thursday, April 14, 2016

[Java] 参数传递

[Java] 参数传递

Java参数传递说难也难说简单也简单,今天做了一道编程题,才觉得掌握了Java的参数传递。
Java的参数传递应该分为两部分来讲,一是基本类型的参数传递,比如int, boolean, double, float等类型,这里要特别说明String不是基本类型。而是对象作为参数传递,比如各种内建类和用户自己实现的类。

基本类型参数传递

这个比较好讲,类比C语言,基本类型作为参数传递时,是传递值的拷贝,无论在函数中如何改变这个值的拷贝,原值是不会改变的

对象作为参数传递

这里就比较令人迷惑了,今天写了一个用遍历单项链表的方法找末尾元素和长度的函数,输入为单项链表的表头元素
 LinkedListNode  tail = findEndNode(first)
 public static int findEndNode(LinkedListNode tail) {
  int length = 1;
  while (tail.next != null) {
   length++;
   tail = tail.next;
  }
                return int;
 }
本以为函数不用返回找到的尾项,输入表头后直接变末尾元素,可惜不是这样。输出后发现表头没有任何改变,和基本类型作为参数传递的情况一样。
实际上Java中对象作为参数传递时,是把对象在在内存中的地址拷贝了一份给了参数,也就是说这里参数传递时的tail是first地址的拷贝,对tail进行操作四号不会影响到first的地址
进行改进后
 public static class Result{
  public LinkedListNode tail;
  public int length;
  public Result(LinkedListNode tail, int length) {
   this.tail = tail;
   this.length = length;
  }
 }
 public static Result findEndNode(LinkedListNode tail) {
  int length = 1;
  while (tail.next != null) {
   length++;
   tail = tail.next;
  }
  return new Result(tail,length);
 }
这样才可以同时返回单项列表长度以及末尾元素